Research on Word Counting Methods in Java Strings Using Character Traversal

Keywords: Java | String Processing | Word Counting

Abstract: This paper delves into technical solutions for counting words in Java strings using only basic string methods. By analyzing the character state machine model, it elaborates on how to accurately identify word boundaries and perform counting with fundamental methods like charAt and length, combined with loop structures. The article compares the pros and cons of various implementation strategies, provides complete code examples and performance analysis, offering practical technical references for string processing.

Technical Background and Problem Definition

In Java programming, counting words in a string is a common fundamental task. The user requirement explicitly specifies using only basic methods of the String class, such as charAt, length, or substring, while allowing loops and conditional statements. This constraint excludes the use of advanced APIs like split, requiring developers to deeply understand the underlying character processing mechanisms of strings.

Core Algorithm Design

The character traversal method based on a state machine is an effective strategy to solve this problem. This method maintains a state variable to track whether it is currently inside a word, accurately identifying the start and end of words. Specifically, two states are defined: OUT (indicating currently outside a word, i.e., in separator regions) and IN (indicating currently inside a word). The core logic of the algorithm is as follows: traverse each character of the string; when a letter character is encountered and the previous state is OUT, it signifies the start of a new word, increment the count and switch the state to IN; when a non-letter character is encountered and the state is IN, it signifies the end of a word, switch the state to OUT. This design correctly handles cases with multiple consecutive spaces, leading and trailing spaces, etc.

Detailed Java Code Implementation

The following is a rewritten Java implementation based on the core algorithm, strictly adhering to the requirement of using only basic string methods:

public static int countWords(String s) {
    int wordCount = 0;
    boolean word = false;
    int endOfLine = s.length() - 1;

    for (int i = 0; i < s.length(); i++) {
        if (Character.isLetter(s.charAt(i)) && i != endOfLine) {
            word = true;
        } else if (!Character.isLetter(s.charAt(i)) && word) {
            wordCount++;
            word = false;
        } else if (Character.isLetter(s.charAt(i)) && i == endOfLine) {
            wordCount++;
        }
    }
    return wordCount;
}

Code analysis: First, initialize the word count wordCount to 0 and the state flag word to false. Traverse each character of the string using charAt(i) to get the current character. If the current character is a letter and not at the end of the string, set word to true, marking entry into word state. If the current character is not a letter and word is true, it indicates the end of a word, increment the count and reset the state. For letter characters at the end of the string, handle them separately to ensure the last word is counted. The time complexity of this method is O(n), where n is the string length, and the space complexity is O(1), using only a few variables.

Comparative Analysis with Other Methods

Referring to other implementations, such as using the split method, although the code is concise, it relies on regular expressions, which may incur performance overhead and does not meet the constraints of this problem. The state machine method has advantages in resource-constrained environments as it avoids additional array creation and regex processing. For example, the split method internally needs to parse regular expressions and allocate memory to store the result array, whereas the character traversal method directly operates on the original string, resulting in higher efficiency.

Boundary Conditions and Optimization Suggestions

In practical applications, various boundary conditions must be considered: an empty string should return 0; strings containing only spaces should be handled correctly; non-letter characters such as digits or punctuation can be adjusted by changing Character.isLetter to Character.isLetterOrDigit to include digits based on requirements. For optimization, precomputing the string length can avoid repeated calls to length(), though modern JVM optimizations typically handle such issues. For very long strings, it is advisable to use StringBuilder or character arrays for batch processing, but under the constraints of this problem, simplicity should be maintained.

Conclusion

This paper elaborates in detail on the character traversal-based method for counting words in Java strings, accurately identifying word boundaries through a state machine model. This method not only meets the problem constraints but also demonstrates the efficiency of low-level string processing. Developers can adjust the character judgment logic based on actual needs, extending its application to more complex text analysis scenarios.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.