Keywords: HTTP | Content-Disposition | Filename Encoding | RFC 5987 | Browser Compatibility
Abstract: This paper thoroughly examines the encoding challenges of filename parameters in HTTP Content-Disposition headers. Addressing RFC 2183's US-ASCII character set limitations, it analyzes the UTF-8 encoding scheme proposed in RFC 5987 and its implementation variations across major browsers. Through detailed encoding examples and browser compatibility testing, practical encoding strategies are provided to assist developers in correctly handling filename downloads containing non-ASCII characters.
Introduction
In modern web applications, forcing resource downloads instead of inline display is a common requirement, achieved through the Content-Disposition header in HTTP responses. The filename parameter of this header suggests a name for the file when downloaded by the browser. However, RFC 2183 explicitly restricts this parameter to the US-ASCII character set, creating significant limitations in practical applications.
RFC Specification Evolution
The original RFC 2183 states that filename parameters should follow RFC 2045 syntax, limited to US-ASCII characters. While the document acknowledges the need for arbitrary character set support, it does not define specific mechanisms.
Subsequent RFC 2184 was replaced by RFC 2231, which provides more comprehensive parameter encoding mechanisms for MIME messages. Ultimately, RFC 5987 specifically defines character set and language encoding schemes for HTTP header field parameters, forming the theoretical basis for handling non-ASCII filenames.
RFC 5987 Encoding Scheme
RFC 5987 introduces the filename* parameter, supporting UTF-8 encoding through a specific syntax format:
Content-Disposition: attachment; filename*=UTF-8''Na%C3%AFve%20file.txtHere, UTF-8 specifies the character set, the single quote pair indicates language (can be empty), and the subsequent part is the percent-encoded UTF-8 byte sequence. This format allows complete representation of Unicode characters, such as "naïve" in the example (third character U+00EF).
Browser Compatibility Analysis
Despite RFC 5987 providing a standard scheme, browser implementations vary:
- Modern Browsers (Chrome, Firefox, Edge): Generally support the
filename*parameter and correctly handle UTF-8 encoding. - Safari: Some versions may directly support UTF-8 encoded
filenameparameters without special syntax. - Legacy IE (e.g., IE8): Require percent-encoded
filenameparameters but do not fully adhere to RFC 5987 format.
For maximum compatibility, it is recommended to provide both filename and filename* parameters:
Content-Disposition: attachment; filename="naive file.txt"; filename*=UTF-8''Na%C3%AFve%20file.txtBrowsers not supporting RFC 5987 will ignore filename* and use the fallback ASCII approximation.
Practical Encoding Examples
The following C# code demonstrates how to dynamically generate appropriate Content-Disposition headers based on browser type:
string contentDisposition;
if (Request.Browser.Browser == "IE" && (Request.Browser.Version == "7.0" || Request.Browser.Version == "8.0"))
contentDisposition = "attachment; filename=" + Uri.EscapeDataString(fileName);
else if (Request.Browser.Browser == "Safari")
contentDisposition = "attachment; filename=" + fileName;
else
contentDisposition = "attachment; filename=\"" + fileName + "\"; filename*=UTF-8''" + Uri.EscapeDataString(fileName);
Response.AddHeader("Content-Disposition", contentDisposition);This code addresses the different requirements of legacy IE versions, Safari, and modern browsers, ensuring correct filename display.
Special Handling for Android Devices
Android's built-in download manager has limitations in parsing filenames, requiring additional processing:
private static readonly Dictionary<char, char> AndroidAllowedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ._-+,@£$€!½§~'=()[]{}0123456789".ToDictionary(c => c);
private string MakeAndroidSafeFileName(string fileName)
{
char[] newFileName = fileName.ToCharArray();
for (int i = 0; i < newFileName.Length; i++)
{
if (!AndroidAllowedChars.ContainsKey(newFileName[i]))
newFileName[i] = '_';
}
return new string(newFileName);
}This function replaces unsupported characters with underscores, ensuring compatibility with Android devices.
Alternative Approaches
Beyond header encoding, filenames can be implied through URL paths:
/download_script.php/na%C3%AFve_file.txtBrowsers typically use the last part of the URL as the default filename, eliminating the need for a Content-Disposition header. This method offers excellent compatibility but requires server support for URL rewriting to hide the actual script path.
Conclusion
Filename encoding in HTTP Content-Disposition headers is a complex yet crucial issue. RFC 5987 provides a standard UTF-8 support scheme, but browser compatibility variations necessitate a progressive enhancement strategy from developers. By combining filename and filename* parameters and applying special handling for specific environments like Android, cross-platform filename correctness can be ensured. Continuous monitoring of browser updates and standard evolution will help simplify future implementation schemes.