URL Encoding of Space Character: A Comparative Analysis of + vs %20

Keywords: URL encoding | space encoding | percent encoding | HTML forms | query string

Abstract: This technical paper provides an in-depth analysis of the two encoding methods for space characters in URLs: '+' and '%20'. By examining the differences between HTML form data submission and standard URI encoding specifications, it explains why '+' encoding is commonly found in query strings while '%20' is mandatory in URL paths. The article combines W3C standards, historical evolution, and practical development cases to offer comprehensive technical insights and programming guidance for proper URL encoding implementation.

Fundamental Concepts of URL Encoding

URL encoding, also known as percent-encoding, is the process of converting characters that cannot be directly used in URLs into a safe format. Since URLs can only be transmitted using the ASCII character set, any non-ASCII characters or special characters require encoding. The space character, being one of the most common characters requiring encoding, has two distinct encoding methods: '+' and '%20'.

Historical Context and Specification Differences

This encoding discrepancy stems from the historical development of the internet. Early URI percent-encoding specifications required all space characters to be encoded as '%20', which remains the most standard encoding method. However, during the evolution of HTML form data submission, a modified encoding scheme emerged.

According to the W3C HTML4 specification, when HTML form data is submitted via GET or POST methods, it defaults to using the application/x-www-form-urlencoded encoding format. This format is based on early URI percent-encoding rules but includes significant modifications: replacing spaces with '+' instead of '%20', along with other changes such as newline normalization.

Encoding Rules for Different URL Components

The structural complexity of URLs necessitates different encoding rules for various components. A complete URL can be decomposed into multiple parts:

https://user:pass@example.com:8080/path/file;param=value?query=string#fragment

In the path component (before '?'), spaces must be encoded as '%20' and should never use '+'. This is because the path component follows strict URI standard encoding rules. For example, the path '/documents/my file.txt' should be encoded as '/documents/my%20file.txt'.

In the query string component (after '?'), the situation differs. Due to historical compatibility reasons, spaces can be encoded as either '+' or '%20'. This flexibility leads to an important consequence: the '+' character itself must be encoded as '%2B' in query strings to avoid ambiguity.

Practical Implementation Examples

Consider the encoding of a string containing spaces 'blue light blue' in different URL components:

// Path uses %20, query uses + and %2B
http://example.com/blue%20light%20blue?blue+light+blue

In the path '/blue light blue', both spaces are encoded as '%20'. In the query string '?blue+light+blue', spaces are encoded as '+', but if the string contains actual '+' characters, such as 'blue+light', the '+' must be encoded as '%2B', resulting in 'blue%2Blight'.

Implementation Variations Across Programming Languages

Different programming languages provide various URL encoding functions with differing space handling:

// JavaScript example
let original = 'hello world';
let encoded1 = encodeURIComponent(original); // 'hello%20world'
let encoded2 = original.replace(/ /g, '+');  // 'hello+world'

// PHP example
$original = 'hello world';
$encoded1 = urlencode($original);    // 'hello+world'
$encoded2 = rawurlencode($original); // 'hello%20world'

JavaScript's encodeURIComponent() function follows standard URI encoding, converting spaces to '%20'. PHP offers two functions: urlencode() uses '+' for spaces (compatible with form submission), while rawurlencode() uses '%20' (RFC compliant).

Modern Web Development Best Practices

In contemporary web development, the following principles are recommended:

Always use '%20' for space encoding in URL path components
In query strings, prefer '%20' for consistency, but handle potential '+' encoding from received data
When generating URLs, explicitly specify encoding methods rather than relying on browser auto-encoding
When parsing URLs, ensure proper handling of both encoding formats

Here's a comprehensive URL encoding processing example:

function encodeURLComponents(url) {
    // Separate different URL components
    const urlObj = new URL(url);
    
    // Path component uses standard encoding
    const encodedPath = encodeURIComponent(urlObj.pathname).replace(/%2F/g, '/');
    
    // Query parameter processing
    const params = new URLSearchParams(urlObj.search);
    params.forEach((value, key) => {
        // Decode + to space, then re-encode as %20
        const decodedValue = value.replace(/\+/g, ' ');
        params.set(key, encodeURIComponent(decodedValue));
    });
    
    return `${urlObj.origin}${encodedPath}?${params.toString()}`;
}

Compatibility and Standardization Trends

Despite this historical encoding discrepancy, modern web standards are gradually moving toward unified '%20' encoding. New APIs and specifications increasingly favor standard percent-encoding, while '+' encoding is primarily maintained for backward compatibility.

Developers working with URL encoding should understand this historical context but are advised to prioritize standard encoding methods in new projects to ensure long-term maintainability and cross-platform compatibility.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.