Keywords: C++ identifiers | underscore rules | naming conventions | reserved names | POSIX standard
Abstract: This article explores the C++ standard rules regarding underscore usage in identifiers, analyzing reserved patterns such as double underscores and underscores followed by uppercase letters. Through detailed code examples and standard references, it clarifies restrictions in global namespaces and any scope, extends the discussion with POSIX standards, and provides comprehensive naming guidelines for C++ developers.
Introduction
In C++ programming practices, naming conventions for member variables have always been a significant topic. Many developers use prefixes to distinguish member variables from local variables or parameters, such as the common m_foo format in MFC backgrounds, or occasionally seen forms like myFoo. Particularly in C# or .NET environments, using a single underscore as a prefix, like _foo, is recommended. However, is this usage permitted by the C++ standard? This article delves into the rules for underscore usage in identifiers based on the C++ standard.
Underscore Rules in the C++ Standard
The C++ standard explicitly defines naming restrictions for identifiers, and these rules have not changed in C++11. According to the standard, certain identifier patterns are reserved for implementation use, and developers should avoid them to prevent conflicts with compiler or library implementations.
Identifiers Reserved in Any Scope
The following identifier patterns are reserved in any scope, including for implementation macros:
- Identifiers starting with a single underscore followed immediately by an uppercase letter, e.g.,
_Fooor_BAR. This pattern is often used internally by compilers or system macros. - Identifiers containing two consecutive underscores (i.e., "double underscore"), e.g.,
__variableormy__name. Double underscores are typically reserved for compilers and standard libraries.
For example, defining int _Value; or class MyClass { int member__var; }; in code may lead to undefined behavior, as these patterns are reserved.
Identifiers Reserved in the Global Namespace
In the global namespace, all identifiers starting with an underscore are reserved for implementation use. This means that in the global scope, variables or functions like _globalVar should be avoided. For instance:
// Incorrect example: underscore-prefixed identifiers in global namespace
int _counter; // Potential conflict with other implementations
void _helper() { /* ... */ } // Not recommended
In contrast, identifiers with a single underscore followed by a lowercase letter (e.g., _localVar) are generally not reserved in local scopes or namespaces, but for consistency and to avoid potential issues, cautious use is advised.
Reservation Rules for the std Namespace
All names in the std namespace are reserved; developers should not add new identifiers there, except for template specializations. For example, defining std::my_function is undefined behavior, while specializing std::vector<int> is allowed.
Association and Differences with the C Language Standard
The C++ language is based on C, with the C99 standard as a normative reference. Its reserved identifier rules are similar to C++ but have slight differences. In C99:
- All identifiers starting with an underscore followed by an uppercase letter or another underscore are reserved for any use.
- All identifiers starting with an underscore are reserved for file-scope identifiers.
For example, in C code, #define _DEBUG 1 might cause issues because _DEBUG could be used by the implementation. C++ developers should be aware of these differences, especially in mixed programming environments.
Additional Restrictions from the POSIX Standard
Beyond the C++ standard, the POSIX standard reserves many identifiers that may appear in normal code:
- Names starting with a capital
Efollowed by a digit or uppercase letter (e.g.,EINVAL), used for error codes. - Names starting with
isortofollowed by a lowercase letter (e.g.,isalpha), used for character testing and conversion functions. - Names starting with
LC_followed by an uppercase letter (e.g.,LC_TIME), used for locale attribute macros. - Names starting with
str,mem, orwcsfollowed by a lowercase letter (e.g.,strcpy), used for string and array functions. - Names starting with
PRIorSCNfollowed by a lowercase letter orX, used for format specifier macros. - Names ending with
_t(e.g.,size_t), used for type names, which are common in C++ but reserved by POSIX.
For instance, defining typedef int my_type_t; might cause conflicts in POSIX-compliant systems, even if it is safe in pure C++ environments. Developers should evaluate target platforms to avoid issues with future standard updates.
Practical Code Examples and Best Practices
Based on the above rules, the following examples demonstrate how to avoid reserved identifiers:
// Correct example: using safe naming conventions
class MyClass {
private:
int m_value; // MFC-style, using m_ prefix
std::string name_; // Underscore suffix, avoiding prefix issues
public:
void setValue(int value) {
m_value = value; // Clear distinction for member variables
}
};
// Incorrect example: using reserved patterns
int _globalVar; // Global underscore prefix, not recommended
double __temp; // Double underscore, reserved for implementation
void _helperFunction() { // Global function with underscore prefix, potential conflict
// ...
}
In personal practice, many developers adopt simple rules: never start identifiers with any form of underscore and avoid double underscores. Combining this with unique namespaces (e.g., namespace my_project { ... }) can further reduce conflict risks, although namespaces do not protect against macro pollution.
Modern Compiler Handling and Common Misconceptions
As referenced in auxiliary articles, some believe that variables starting with underscores might "puzzle" compilers, as compilers could use them for temporary variables. However, modern compilers like Visual Studio and g++ have become more intelligent and can handle most cases without errors. This does not mean the standard rules can be ignored—undefined behavior may manifest on specific platforms or in future versions.
For example, code like int _a; might compile successfully in current compilers, but it relies on implementation details rather than standard guarantees. The best practice is to adhere to the standard to avoid potential pitfalls.
Conclusion
The use of underscores in C++ identifiers is governed by strict rules designed to prevent conflicts with implementations. Key recommendations include avoiding identifiers that start with an underscore or contain double underscores, understanding reservations in the global and std namespaces, and considering extensions like POSIX standards. By adopting consistent naming conventions, such as suffix underscores or explicit prefixes, developers can write portable and robust code. In complex projects, combining namespaces and code reviews effectively enforces these guidelines.