Keywords: URL encoding | space encoding | percent encoding | HTML forms | query string
Abstract: This technical paper provides an in-depth analysis of the two encoding methods for space characters in URLs: '+' and '%20'. By examining the differences between HTML form data submission and standard URI encoding specifications, it explains why '+' encoding is commonly found in query strings while '%20' is mandatory in URL paths. The article combines W3C standards, historical evolution, and practical development cases to offer comprehensive technical insights and programming guidance for proper URL encoding implementation.
Fundamental Concepts of URL Encoding
URL encoding, also known as percent-encoding, is the process of converting characters that cannot be directly used in URLs into a safe format. Since URLs can only be transmitted using the ASCII character set, any non-ASCII characters or special characters require encoding. The space character, being one of the most common characters requiring encoding, has two distinct encoding methods: '+' and '%20'.
Historical Context and Specification Differences
This encoding discrepancy stems from the historical development of the internet. Early URI percent-encoding specifications required all space characters to be encoded as '%20', which remains the most standard encoding method. However, during the evolution of HTML form data submission, a modified encoding scheme emerged.
According to the W3C HTML4 specification, when HTML form data is submitted via GET or POST methods, it defaults to using the application/x-www-form-urlencoded encoding format. This format is based on early URI percent-encoding rules but includes significant modifications: replacing spaces with '+' instead of '%20', along with other changes such as newline normalization.
Encoding Rules for Different URL Components
The structural complexity of URLs necessitates different encoding rules for various components. A complete URL can be decomposed into multiple parts:
https://user:pass@example.com:8080/path/file;param=value?query=string#fragment
In the path component (before '?'), spaces must be encoded as '%20' and should never use '+'. This is because the path component follows strict URI standard encoding rules. For example, the path '/documents/my file.txt' should be encoded as '/documents/my%20file.txt'.
In the query string component (after '?'), the situation differs. Due to historical compatibility reasons, spaces can be encoded as either '+' or '%20'. This flexibility leads to an important consequence: the '+' character itself must be encoded as '%2B' in query strings to avoid ambiguity.
Practical Implementation Examples
Consider the encoding of a string containing spaces 'blue light blue' in different URL components:
// Path uses %20, query uses + and %2B
http://example.com/blue%20light%20blue?blue+light+blue
In the path '/blue light blue', both spaces are encoded as '%20'. In the query string '?blue+light+blue', spaces are encoded as '+', but if the string contains actual '+' characters, such as 'blue+light', the '+' must be encoded as '%2B', resulting in 'blue%2Blight'.
Implementation Variations Across Programming Languages
Different programming languages provide various URL encoding functions with differing space handling:
// JavaScript example
let original = 'hello world';
let encoded1 = encodeURIComponent(original); // 'hello%20world'
let encoded2 = original.replace(/ /g, '+'); // 'hello+world'
// PHP example
$original = 'hello world';
$encoded1 = urlencode($original); // 'hello+world'
$encoded2 = rawurlencode($original); // 'hello%20world'
JavaScript's encodeURIComponent() function follows standard URI encoding, converting spaces to '%20'. PHP offers two functions: urlencode() uses '+' for spaces (compatible with form submission), while rawurlencode() uses '%20' (RFC compliant).
Modern Web Development Best Practices
In contemporary web development, the following principles are recommended:
- Always use '%20' for space encoding in URL path components
- In query strings, prefer '%20' for consistency, but handle potential '+' encoding from received data
- When generating URLs, explicitly specify encoding methods rather than relying on browser auto-encoding
- When parsing URLs, ensure proper handling of both encoding formats
Here's a comprehensive URL encoding processing example:
function encodeURLComponents(url) {
// Separate different URL components
const urlObj = new URL(url);
// Path component uses standard encoding
const encodedPath = encodeURIComponent(urlObj.pathname).replace(/%2F/g, '/');
// Query parameter processing
const params = new URLSearchParams(urlObj.search);
params.forEach((value, key) => {
// Decode + to space, then re-encode as %20
const decodedValue = value.replace(/\+/g, ' ');
params.set(key, encodeURIComponent(decodedValue));
});
return `${urlObj.origin}${encodedPath}?${params.toString()}`;
}
Compatibility and Standardization Trends
Despite this historical encoding discrepancy, modern web standards are gradually moving toward unified '%20' encoding. New APIs and specifications increasingly favor standard percent-encoding, while '+' encoding is primarily maintained for backward compatibility.
Developers working with URL encoding should understand this historical context but are advised to prioritize standard encoding methods in new projects to ensure long-term maintainability and cross-platform compatibility.