Complete Guide to Enabling UTF-8 in Java Web Applications

Nov 26, 2025 · Programming · 11 views · 7.8

Keywords: java | mysql | tomcat | encoding | utf-8

Abstract: This article provides a comprehensive guide to configuring UTF-8 encoding in Java web applications using servlets and JSP with Tomcat and MySQL. It covers server settings, custom filters, JSP encoding, HTML meta tags, database connections, and handling special characters in GET requests, ensuring support for international characters like Finnish and Cyrillic.

Introduction

UTF-8 encoding is essential for web applications that need to support international characters, such as Finnish text with characters like ä, ö, å and Cyrillic alphabets like Ц, ж, Ф. Based on practical cases, this guide explains how to enable UTF-8 in Java web applications using servlets and JSP, deployed on Tomcat with a MySQL database. Through step-by-step configuration, it ensures the entire flow from request to response uses UTF-8 encoding, preventing character corruption issues.

Configuring Tomcat Server

To handle GET request parameters in UTF-8 encoding, modify the Tomcat server.xml file. Add the URIEncoding="UTF-8" attribute to the Connector element, as shown in this example code:

<Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" compression="on" compressionMinSize="128" noCompressionUserAgents="gozilla, traviata" compressableMimeType="text/html,text/xml,text/plain,text/css,text/javascript,application/x-javascript,application/javascript" URIEncoding="UTF-8"/>

This setting instructs Tomcat to decode URL parameters using UTF-8 encoding, for instance, the character ж is encoded as %D0%B6 in URLs. Note that this configuration only affects GET requests; POST requests are not impacted.

Implementing a Charset Filter

A custom filter can enforce UTF-8 encoding for all requests and responses. Here is a rewritten CharsetFilter example:

package com.example.filters;

import javax.servlet.*;
import java.io.IOException;

public class CharsetFilter implements Filter {
    private String encoding;

    public void init(FilterConfig config) throws ServletException {
        encoding = config.getInitParameter("requestEncoding");
        if (encoding == null) encoding = "UTF-8";
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
            throws IOException, ServletException {
        if (request.getCharacterEncoding() == null) {
            request.setCharacterEncoding(encoding);
        }
        response.setContentType("text/html; charset=UTF-8");
        response.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    public void destroy() {
    }
}

Configure this filter in web.xml:

<filter>
    <filter-name>CharsetFilter</filter-name>
    <filter-class>com.example.filters.CharsetFilter</filter-class>
    <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>CharsetFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

This filter ensures that if the request encoding is not specified, it defaults to UTF-8, and sets the response content type and encoding.

Setting JSP Page Encoding

Configure JSP pages to use UTF-8 encoding globally via web.xml:

<jsp-config>
    <jsp-property-group>
        <url-pattern>*.jsp</url-pattern>
        <page-encoding>UTF-8</page-encoding>
    </jsp-property-group>
</jsp-config>

Or add a directive to the top of each JSP page:

<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

This ensures the JVM handles characters in JSP pages correctly.

Using HTML Meta Tags

Include meta tags in HTML pages to specify the character set, informing browsers to use UTF-8 for rendering. Example code:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    ...
</head>
<body>
    ...
</body>
</html>

Combined with response headers, this effectively prevents browser encoding errors.

Configuring JDBC Connection

Ensure the JDBC connection uses UTF-8 encoding by defining the resource in context.xml or similar:

<Resource name="jdbc/AppDB" auth="Container" type="javax.sql.DataSource" maxActive="20" maxIdle="10" maxWait="10000" username="foo" password="bar" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/ID_development?useEncoding=true&characterEncoding=UTF-8"/>

The characterEncoding=UTF-8 parameter forces the connection to use UTF-8 encoding.

MySQL Database and Tables

Create the database and tables with UTF-8 character set:

CREATE DATABASE `ID_development` DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci;
CREATE TABLE `Users` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(30) COLLATE utf8_swedish_ci DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci;

Using CHARSET=utf8 ensures data storage and retrieval in UTF-8.

MySQL Server Configuration

Set the default character set in MySQL server configuration files (e.g., my.ini or my.cnf):

[client]
port=3306
default-character-set=utf8

[mysql]
default-character-set=utf8

This makes UTF-8 the default encoding for clients and the server.

MySQL Procedures and Functions

When creating stored procedures or functions, specify the character set:

DELIMITER $$

CREATE FUNCTION `pathToNode` (ryhma_id INT) RETURNS TEXT CHARACTER SET utf8
READS SQL DATA
BEGIN
    DECLARE path VARCHAR(255) CHARACTER SET utf8;
    SET path = NULL;
    -- logic code
    RETURN path;
END $$

DELIMITER ;

This ensures string operations use UTF-8 encoding.

Handling GET Requests

GET requests may have encoding inconsistencies because browsers might default to Latin-1 for URLs. With Tomcat configured with URIEncoding="UTF-8", UTF-8 characters like ж are correctly encoded as %D0%B6. However, characters like ä might be encoded as %E4 in Latin-1 versus %C3%A4 in UTF-8, potentially causing processing errors. POST requests do not have this issue, as browsers encode form data based on the page encoding.

Additional Notes

MySQL's utf8 character set only supports the Basic Multilingual Plane; for characters requiring 4-byte UTF-8, use utf8mb4 or VARBINARY types. If using Apache with Tomcat via mod_JK, add URIEncoding="UTF-8" to the AJP connector in Tomcat and set AddDefaultCharset utf-8 in Apache's httpd.conf. Referencing other cases, such as property file encoding issues, ensure resource files are saved in UTF-8.

Conclusion

By comprehensively configuring Tomcat, filters, JSP, HTML, JDBC, and MySQL, Java web applications can fully support UTF-8 encoding, effectively handling international characters. It is recommended to test compatibility across different browsers in both development and production environments to ensure a smooth user experience.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.