Properly Reading UTF-8 Encoded InputStream in Java

Nov 23, 2025 · Programming · 10 views · 7.8

Keywords: Java | UTF-8 | InputStream

Abstract: This article examines character encoding issues when reading UTF-8 encoded text files from the network in Java. By analyzing the charset specification mechanism of InputStreamReader, it explains the causes of garbled characters with default encoding and provides two correct solutions for pre- and post-Java 7 environments. The discussion covers fundamental encoding principles and best practices to help developers avoid common pitfalls.

Problem Background and Phenomenon Analysis

Reading text files from remote servers is a common requirement in Java network programming. When files contain non-ASCII characters, garbled text occurs if character encoding is not properly specified. The original code uses URL url = new URL("http://kuehldesign.net/test.txt"); to establish connection, then creates a reader via BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));.

In-depth Analysis of Garbled Text Causes

The test file contains special characters like ¡Hélló!, but output displays as > ¬°H√©ll√²!. The root cause is that the InputStreamReader constructor does not explicitly specify charset. Java defaults to platform encoding, which causes inconsistent behavior in cross-platform deployments.

Solution Implementation

The core solution is to specify UTF-8 charset: BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));. Since Java 7, using the StandardCharsets.UTF_8 constant is recommended: BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));.

Character Encoding Principles

UTF-8 is a variable-length character encoding for Unicode that can represent all Unicode characters. When encoding is unspecified, InputStreamReader uses JVM default charset, which may mismatch the source file encoding, causing incorrect byte-to-character conversion.

Complete Code Example

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.charset.StandardCharsets;

public class UTF8Reader {
    public static void main(String[] args) throws Exception {
        URL url = new URL("http://kuehldesign.net/test.txt");
        
        // Recommended approach for Java 7+
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(url.openStream(), StandardCharsets.UTF_8)
        );
        
        String line;
        while ((line = reader.readLine()) != null) {
            System.out.println("> " + line);
        }
        reader.close();
    }
}

Best Practices Recommendations

Always explicitly specify character encoding when handling network resources. Use try-with-resources to ensure proper resource cleanup: try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8))) { ... }. Also consider handling potential encoding exceptions and network timeout scenarios.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.