Allowed Characters in Cookies: Historical Specifications, Browser Implementations, and Best Practices

Dec 02, 2025 · Programming · 18 views · 7.8

Keywords: Cookie character set | browser compatibility | RFC 6265

Abstract: This article explores the allowed character sets in cookie names and values, based on the original Netscape specification, RFC standards, and real-world browser behaviors. It analyzes the handling of special characters like hyphens, compatibility issues with non-ASCII characters, and compares standards such as RFC 2109, 2965, and 6265. Through code examples and detailed explanations, it provides practical guidance for developers to use cookies safely in cross-browser environments, emphasizing adherence to the RFC 6265 subset to avoid common pitfalls.

Historical Background and Evolution of Cookie Character Specifications

Cookie technology was initially defined by Netscape in the 1990s, with its original specification (often referred to as cookie_spec) outlining basic rules for the NAME=VALUE string. According to this spec, the entire string must exclude semicolon, comma, and whitespace characters. This implies that characters like the hyphen (-) are theoretically allowed, and in practice, most browsers support this. However, the original specification had ambiguities, such as not explicitly addressing control characters or non-ASCII characters, leading to subsequent compatibility issues.

Handling of Special Characters and Browser Behaviors

In cookie names and values, certain special characters require careful attention. For example, the equals sign (=) should be avoided in names because browsers always split the name and value based on the first equals sign; in values, it is allowed but may cause parsing ambiguities. Additionally, spaces and commas often work in real-world tests, but the specification recommends avoiding them to ensure compatibility. Control characters (e.g., \x00 to \x1F and \x7F) are typically filtered or rejected by browsers and should not be used in cookies.

For non-ASCII characters (e.g., Unicode), browser implementations show significant inconsistencies. For instance, Opera and Chrome use UTF-8 encoding, Internet Explorer relies on the local code page, Firefox processes based on the low byte of UTF-16, and Safari completely refuses to send cookies containing non-ASCII characters. This variability means that in cross-browser applications, direct use of non-ASCII characters should be avoided in favor of encoding schemes like URL encoding (via JavaScript's encodeURIComponent function). Below is an example code demonstrating how to safely set a cookie value with special characters:

// Use encodeURIComponent to encode the cookie value
var cookieValue = encodeURIComponent("user data@123");
document.cookie = "userData=" + cookieValue + "; path=/";
// Decode using decodeURIComponent
var decodedValue = decodeURIComponent(cookieValue.split("=")[1]);

Comparison of RFC Standards and Real-World Practices

To standardize cookie behavior, several RFC standards have been proposed, but most have not been widely implemented. RFC 2109 and 2965 introduced stricter character restrictions, such as defining names as RFC 2616 tokens and allowing values in quoted strings with other characters, but these features were not adopted by browsers. In contrast, RFC 6265 (an HTML5-era standard) is closer to reality, defining the name character set as alphanumerics plus !#$%&'*+-.^_`|~ and the value character set as alphanumerics plus !#$%&'()*+-./:<=>?@[]^_`{|}~, while prohibiting control and non-ASCII characters. Although RFC 6265 does not perfectly match all browser behaviors, it provides a safe subset recommended for developers when generating cookies.

Best Practices and Code Examples

Based on the above analysis, developers should adopt the following best practices when handling cookies: first, prioritize using the character set defined in RFC 6265 to avoid compatibility issues; second, always encode special or non-ASCII data (e.g., via URL encoding) before storage; and finally, when parsing cookies, handle potential anomalous characters flexibly to enhance robustness. Below is a comprehensive example showing how to safely set and read cookies:

// Set a cookie, ensuring the name and value conform to the RFC 6265 subset
function setCookie(name, value, days) {
    var encodedName = encodeURIComponent(name); // Encode name to handle special characters
    var encodedValue = encodeURIComponent(value); // Encode value
    var expires = "";
    if (days) {
        var date = new Date();
        date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
        expires = "; expires=" + date.toUTCString();
    }
    document.cookie = encodedName + "=" + encodedValue + expires + "; path=/";
}

// Read a cookie, handling possible encoded values
function getCookie(name) {
    var encodedName = encodeURIComponent(name);
    var cookies = document.cookie.split(';');
    for (var i = 0; i < cookies.length; i++) {
        var cookie = cookies[i].trim();
        if (cookie.indexOf(encodedName + "=") === 0) {
            return decodeURIComponent(cookie.substring(encodedName.length + 1));
        }
    }
    return null;
}

In summary, understanding the complexities of cookie character sets is crucial for building stable web applications. By combining historical specifications, browser implementations, and modern standards, developers can avoid common pitfalls and ensure cross-environment compatibility.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.