Keywords: Windows file name validation | C# implementation | regular expressions
Abstract: This article delves into the validation of legal file names in Windows systems. It begins by outlining the core rules from MSDN documentation, including prohibited characters and DOS reserved names. The focus then shifts to the System.IO.Path class methods in C#, specifically GetInvalidFileNameChars and GetInvalidPathChars, noting that their returned character arrays may be incomplete. Code examples using regular expressions for validation are provided, along with discussions on implementation differences across .NET framework versions. Finally, additional considerations such as path length limits and Unicode support are summarized for practical applications.
Overview of Windows File Naming Rules
In Windows operating systems, validating the legality of file names is a common yet often overlooked task. According to the MSDN document "Naming a File or Directory," legal file names must adhere to a set of rules. First, file names can include any character from the current code page (including Unicode/ANSI characters above 127), except for the following: <, >, :, ", /, \, |, ?, *. Additionally, characters with integer representations from 0 to 31 (less than ASCII space) are not allowed. It is also important to note that the target file system may disallow certain other characters, such as trailing periods or spaces.
DOS Reserved Names and Special Restrictions
Windows reserves several DOS device names that cannot be used as file names. These include: CON, PRN, AUX, NUL, COM0 through COM9, and LPT0 through LPT9. Even with extensions added (e.g., AUX.txt), these names remain invalid. Another critical restriction is that file names cannot consist entirely of periods. For instance, names like ... or .. are prohibited.
File Name Validation Methods in C#
In C#, the System.IO.Path class provides static methods to retrieve invalid characters. Specifically, Path.GetInvalidFileNameChars() returns a character array containing characters invalid in file names. Similarly, Path.GetInvalidPathChars() returns characters invalid in paths. However, it is crucial to note that the MSDN documentation explicitly states that these methods do not guarantee a complete set of invalid characters. Therefore, in practical applications, additional rules may need to be incorporated for validation.
Validation Using Regular Expressions
A common approach to validation involves using regular expressions. Below is an example code snippet demonstrating how to construct a regex using Path.GetInvalidFileNameChars():
bool IsValidFilename(string testName)
{
if (string.IsNullOrEmpty(testName))
return false;
// Construct regex to match any invalid character
string invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
Regex containsBadChar = new Regex($"[{invalidChars}]");
if (containsBadChar.IsMatch(testName))
return false;
// Check for DOS reserved names
string[] reservedNames = { "CON", "PRN", "AUX", "NUL",
"COM0", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
"LPT0", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9" };
if (reservedNames.Contains(testName.ToUpperInvariant()))
return false;
// Ensure the file name is not all periods
if (testName.All(c => c == '.'))
return false;
return true;
}
This code first checks if the input string is null or empty, then uses a regex to match any invalid characters. It subsequently verifies that the file name does not conflict with DOS reserved names and ensures it is not composed entirely of periods. Note that in versions prior to .NET Framework 3.5, the behavior of Path.GetInvalidFileNameChars() may differ, so adjustments based on the target framework are necessary in real-world development.
Additional Considerations
Beyond character validation, path length limitations must be considered. In Windows, file paths (including the file name) typically cannot exceed 260 characters unless the \\?\ prefix is used. For Unicode paths, when using the \\?\ prefix, the length may extend up to 32,000 characters, but care must be taken as directory component expansion could cause overflow. Furthermore, file systems may impose additional restrictions, such as certain characters being invalid in specific file systems. Thus, when implementing file name validation, it is advisable to conduct testing tailored to the application's context.
Summary and Best Practices
Validating Windows file names requires a multifaceted approach. First, rely on Path.GetInvalidFileNameChars() and Path.GetInvalidPathChars() to obtain invalid characters, but be aware that these methods may be incomplete. Second, use regular expressions for rapid matching and incorporate additional checks for DOS reserved names and special rules. Finally, account for path length and file system-specific constraints. In practice, developing comprehensive test cases covering various edge scenarios is recommended to ensure reliability. By adhering to these best practices, one can effectively implement file name validation in C# applications, enhancing user experience and system stability.