In-depth Analysis of Using String.split() with Multiple Delimiters in Java

Nov 13, 2025 · Programming · 13 views · 7.8

Keywords: Java string splitting | regex OR operator | multiple delimiter handling

Abstract: This article provides a comprehensive exploration of the String.split() method in Java for handling string splitting with multiple delimiters. Through detailed analysis of regex OR operator usage, it explains how to correctly split strings containing hyphens and dots. The article compares incorrect and correct implementations with concrete code examples, and extends the discussion to similar solutions in other programming languages. Content covers regex fundamentals, delimiter matching principles, and performance optimization recommendations, offering developers complete technical guidance.

Problem Background and Requirements Analysis

In Java programming practice, string splitting is a common operational requirement. Users need to split the string AA.BB-CC-DD.zip using hyphen - and dot . as delimiters, expecting to obtain five separate parts: AA, BB, CC, DD, and zip. However, the initial incorrect implementation split("-\\.") fails to achieve the expected result, stemming from a misunderstanding of regex matching mechanisms.

Correct Application of Regex OR Operator

Java's String.split() method implements splitting functionality based on regular expressions. When multiple delimiters need to be matched, the OR operator | must be used to construct the correct regex pattern. The erroneous code split("-\\.") actually matches the consecutive character combination -., rather than independent - or . characters.

The correct implementation should be:

private void getId(String pdfName) {
    String[] tokens = pdfName.split("-|\\.");
}

In this regex pattern "-|\\.":

Detailed Explanation of Regex Escaping Mechanism

In Java regular expressions, the dot character . has special meaning, representing matching any single character except newline. Therefore, when literal dot matching is required, it must be escaped using backslash. Since Java strings themselves use backslash as an escape character, double escaping is necessary, written as \\..

For the splitting process of string AA.BB-CC-DD.zip:

  1. The regex engine scans the entire string
  2. Splitting occurs when encountering dot . or hyphen -
  3. Splitting at the dot between AA and BB
  4. Splitting at the hyphen between BB and CC
  5. Splitting at the hyphen between CC and DD
  6. Splitting at the dot between DD and zip
  7. Finally obtaining five separate substrings

Multi-language Solution Comparison

Examining implementation approaches in other programming languages can deepen understanding of multi-delimiter processing. In Python, similar string splitting can be achieved through:

def split_multiple_delimiters(input_string, delimiters):
    # Replace delimiters with spaces, then split by space
    for delimiter in delimiters:
        input_string = input_string.replace(delimiter, ' ')
    return input_string.split()

While this method is intuitive, it may be less performant than direct regex usage, especially when processing large volumes of strings or complex delimiter patterns.

Performance Optimization and Best Practices

In practical development, if the same splitting operation needs to be performed frequently, precompiling the regular expression is recommended:

private static final Pattern DELIMITER_PATTERN = Pattern.compile("-|\\.");

private void getId(String pdfName) {
    String[] tokens = DELIMITER_PATTERN.split(pdfName);
}

Advantages of this approach include:

Edge Case Handling

In practical applications, various edge cases need consideration:

// Handling consecutive delimiters
String test1 = "AA..BB--CC";
String[] result1 = test1.split("-|\\.");
// Result: ["AA", "", "BB", "", "CC"]

// Using negative lookahead to avoid empty strings
String[] result2 = test1.split("(?<=-|\\.)(?!-|\\.)");
// More complex regex for handling consecutive delimiters

For empty strings generated by consecutive delimiters, retention or filtering can be chosen based on specific requirements.

Extended Application Scenarios

Multi-delimiter splitting technology extends beyond simple filename parsing to widespread applications including:

By mastering the correct usage of regex OR operators, developers can efficiently handle various complex string splitting requirements, improving code quality and performance.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.