Keywords: Java | string parsing | performance comparison
Abstract: This paper provides an in-depth comparison of three Java string parsing tools: Scanner, StringTokenizer, and String.split. It examines their API designs, performance characteristics, and practical use cases, highlighting Scanner's advantages in type parsing and stream processing, String.split's simplicity for regex-based splitting, and StringTokenizer's limitations as a legacy class. Code examples and performance data are included to guide developers in selecting the appropriate tool.
Introduction
String parsing is a common task in Java programming, with multiple tools available. Based on technical Q&A data, this article systematically analyzes the features of java.util.Scanner, StringTokenizer, and String.split(), helping readers understand their appropriate use cases.
Core Tool Overview
The Scanner class is designed for parsing strings and extracting data of different types, supporting flexible delimiters and type conversions. For example, when handling mixed data, methods like hasNextDouble() and nextDouble() can directly retrieve numerical values:
Scanner scanner = new Scanner("123 45.6 text");
while (scanner.hasNext()) {
if (scanner.hasNextDouble()) {
double num = scanner.nextDouble();
System.out.println("Number: " + num);
} else {
String token = scanner.next();
System.out.println("Text: " + token);
}
}In contrast, the String.split() method splits strings based on regular expressions, returning an array, making it suitable for simple splitting tasks. For instance, splitting by spaces: String[] tokens = input.split("\\s+");. StringTokenizer, as a legacy class, only supports fixed delimiters, offering higher performance but limited functionality.
Performance and API Comparison
According to performance data, StringTokenizer is about twice as fast as String.split() for fixed delimiters, but String.split() can still process thousands of strings in milliseconds. In terms of API design, Scanner provides a stream-like interface for dynamic parsing; String.split() returns an array for easy looping; StringTokenizer uses Enumeration, which is syntactically fussy. Official documentation recommends using split() or the regex package instead of StringTokenizer.
Application Scenario Analysis
Scanner is suitable for scenarios requiring mixed-type parsing or stream input, such as command-line input or text file parsing. String.split() fits simple splitting tasks, especially when array output is needed. For example, processing CSV data: String[] fields = line.split(",");. StringTokenizer is rarely used in modern code, considered only when extreme performance is required and delimiters are fixed.
Conclusion
Tool selection should be based on specific needs: Scanner offers multifunctional parsing, String.split() excels in simplicity, and StringTokenizer is gradually phased out due to limitations. Developers can make informed decisions by combining performance data and API characteristics.