Reading CSV Files with Scanner: Common Issues and Proper Implementation

Nov 26, 2025 · Programming · 11 views · 7.8

Keywords: Java | CSV Parsing | Scanner Class | File Reading | Delimiter

Abstract: This article provides an in-depth analysis of common problems encountered when using Java's Scanner class to read CSV files, particularly the issue of spaces causing incorrect line breaks. By examining the root causes, it presents the correct solution using the useDelimiter() method and explores the complexities of CSV format. The article also introduces professional CSV parsing libraries as alternatives, helping developers avoid common pitfalls and achieve reliable CSV data processing.

Problem Analysis

When using Java's Scanner class to read CSV files, developers often encounter a typical issue: text fields containing spaces are incorrectly split across different lines. This phenomenon stems from the default behavior of the Scanner class, which uses whitespace characters (including spaces, tabs, and line breaks) as delimiters.

The Default Delimiter Issue

When using the Scanner's next() method without explicitly setting a delimiter, the system uses the default whitespace delimiter. Consider the following CSV data:

first,last,email,address 1,address 2
john,smith,blah@blah.com,123 St. Street,
Jane,Smith,blech@blech.com,4455 Roger Cir,apt 2

In the field "address 1", the space is incorrectly recognized as a delimiter, resulting in output becoming:

first,last,email,address 
1,address 
2
john,smith,blah@blah.com,123 
St. 
Street,
Jane,Smith,blech@blech.com,4455 
Roger 
Cir,apt 
2

Correct Solution

To properly parse CSV files, use the useDelimiter() method to set the delimiter to comma:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class CSVReader {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner scanner = new Scanner(new File("uploadedcsv/employees.csv"));
        scanner.useDelimiter(",");
        
        while(scanner.hasNext()) {
            System.out.print(scanner.next() + "|");
        }
        
        scanner.close();
    }
}

For a CSV file containing:

a,b,c d,e
1,2,3 4,5
X,Y,Z A,B

The correct output will be:

a|b|c d|e
1|2|3 4|5
X|Y|Z A|B|

Complexity of CSV Format

While a simple comma delimiter can solve basic problems, the CSV format is actually quite complex. Complete CSV parsing needs to consider various scenarios:

Advantages of Professional CSV Libraries

For production environments, it's recommended to use professional CSV parsing libraries such as:

These libraries have already handled various edge cases of CSV format and can provide more reliable data processing.

Best Practice Recommendations

When implementing CSV parsing, it's recommended to:

  1. Always explicitly set the delimiter
  2. Handle possible exception scenarios
  3. Validate data integrity
  4. Consider using professional CSV libraries
  5. Follow RFC 4180 standards

By correctly using the Scanner.useDelimiter() method, developers can avoid common CSV parsing errors and ensure accurate data reading and processing.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.