Finding All Occurrence Indexes of a Character in Java Strings

Keywords: Java String Processing | indexOf Method | Character Index Search | Loop Traversal | Boyer-Moore Algorithm

Abstract: This paper comprehensively examines methods for locating all occurrence positions of specific characters in Java strings. By analyzing the working mechanism of the indexOf method, it introduces two implementation approaches using while and for loops, comparing their advantages and disadvantages. The article also discusses performance considerations when searching for multi-character substrings and briefly mentions the application value of the Boyer-Moore algorithm in specific scenarios.

Problem Background and Core Requirements

In string processing, it is often necessary to find all occurrence positions of specific characters or substrings. Taking the example string "bannanas" and search character "n", the simple indexOf method can only return the first matching position (index 2), while the actual requirement is to obtain an index list of all matching positions [2,3,5].

Loop Implementation Based on indexOf Method

Java's String.indexOf method provides two overloaded versions: the single-parameter version searches from the beginning of the string, while the dual-parameter version searches from a specified offset. Utilizing this feature, loops can be designed to traverse all matching positions.

While Loop Implementation

int index = word.indexOf(guess);
while (index >= 0) {
    System.out.println(index);
    index = word.indexOf(guess, index + 1);
}

This implementation first obtains the first matching position, then continuously searches from the next character after the previous match in the loop, until indexOf returns -1 (indicating no more matches). This method has clear logic and avoids the problem of adding -1 at the end of the result list.

For Loop Implementation

for (int index = word.indexOf(guess);
     index >= 0;
     index = word.indexOf(guess, index + 1))
{
    System.out.println(index);
}

The for loop version concentrates initialization, condition judgment, and iteration updates in one line, making the code more compact. Both implementations are functionally equivalent, with the choice depending on personal coding style preferences.

Performance Analysis and Optimization Considerations

When the search target guess is a single character, the performance of the above methods is sufficient. However, if guess is a multi-character substring, simple loop searching may not be the optimal solution.

Reference implementation in Perl language:

my $string = 'every occurrence of a substring in a string';
my $char = 'st';
my $offset = 0;
my $result = index($string, $char, $offset);
while ($result != -1) {
    print "Found $char at $result\n";
    $offset = $result + 1;
    $result = index($string, $char, $offset);
}

This pattern is similar to the Java implementation, both based on the strategy of gradually moving the search starting position.

Advanced Algorithm Application Scenarios

For longer search strings, more efficient string search algorithms can be considered. The Boyer-Moore algorithm, by preprocessing the pattern string (search target), can skip multiple characters when matching fails, thereby significantly improving search efficiency. However, in most daily application scenarios, simple indexOf loops are sufficient, and advanced algorithms should only be considered when dealing with large amounts of data or performance-sensitive scenarios.

Practical Application Recommendations

In actual development, it is recommended to encapsulate the search logic as an independent method:

public static List<Integer> findAllIndexes(String str, String target) {
    List<Integer> indexes = new ArrayList<>();
    int index = str.indexOf(target);
    while (index >= 0) {
        indexes.add(index);
        index = str.indexOf(target, index + 1);
    }
    return indexes;
}

This improves code reusability and maintainability, while facilitating unit testing and performance optimization.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.