Keywords: PowerShell | UTF-8 | Character Encoding
Abstract: This article provides an in-depth exploration of various methods to change the default output encoding in PowerShell to UTF-8, including the use of the $PSDefaultParameterValues variable, profile configurations, and differences across PowerShell versions. It analyzes the encoding handling disparities between Windows PowerShell and PowerShell Core, offers detailed code examples and setup steps, and addresses file encoding inconsistencies to ensure cross-platform script compatibility and stability.
Overview of PowerShell Encoding Issues
In PowerShell, the default output encoding settings can lead to character display errors or compatibility issues during file processing. Particularly when using redirection operators such as > and >>, output files are typically encoded in UTF-16, which is impractical in many scenarios, especially when interacting with Unix tools or cross-platform applications. This article systematically explains how to change the default encoding to UTF-8, covering solutions from temporary adjustments to persistent configurations.
Setting Default Encoding with $PSDefaultParameterValues
PowerShell provides the $PSDefaultParameterValues preference variable, allowing users to set default values for cmdlet parameters. For output encoding, the following command can be used to set the default encoding for Out-File and related operations to UTF-8:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
In PowerShell 5.1 or later, this setting also affects the > and >> operators, as they are essentially aliases of Out-File. To uniformly use UTF-8 across all cmdlets that support the -Encoding parameter, use:
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
This method is effective in PowerShell 3.0 and above. Note that in Windows PowerShell, this produces UTF-8 files with a BOM, whereas in PowerShell Core, it defaults to BOM-less UTF-8 files.
Persistent Configuration and Profile Files
To ensure encoding settings apply in every PowerShell session, add the relevant commands to the user profile file, typically located at \Users\me\Documents\WindowsPowerShell\profile.ps1. For example, add the following line:
$PSDefaultParameterValues = @{ 'Out-File:Encoding' = 'utf8' }
This ensures that output encoding is automatically set to UTF-8 upon each PowerShell startup. For scripts or modules, it is advisable to use local variables internally to avoid impacting the global session:
$local:PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }
Differences Between Windows PowerShell and PowerShell Core
There are significant differences in encoding handling between Windows PowerShell (up to version 5.1) and PowerShell Core (v6+). In Windows PowerShell, using the utf8 encoding parameter generates UTF-8 files with a BOM, which may be incompatible with some Unix tools. In contrast, PowerShell Core defaults to BOM-less UTF-8 files, aligning with cross-platform standards. To generate files with a BOM in PowerShell Core, use the utf8BOM parameter.
Other Encoding-Related Settings
The $OutputEncoding variable controls the string encoding when PowerShell communicates with external programs but is unrelated to file output. For instance, setting $OutputEncoding = [System.Text.UTF8Encoding]::new() ensures strings sent to external programs are UTF-8 encoded. Additionally, system-wide UTF-8 settings, such as the beta feature in Windows 10, affect the ANSI code page but only for certain cmdlets like Set-Content, not including Out-File.
Encoding Inconsistencies and Solutions
Default encoding behaviors vary across cmdlets in Windows PowerShell. For example, Out-File and > use UTF-16LE, while Set-Content uses ANSI encoding. When appending content, >> and Out-File -Append do not match the existing file encoding, whereas Add-Content automatically detects and applies the same encoding. Therefore, when handling files with mixed encodings, explicitly specify the encoding parameter to prevent errors.
Cross-Platform Considerations
On Unix-like systems or when using cross-platform editors like Visual Studio Code, script files are often BOM-less. PowerShell Core handles such files correctly, but Windows PowerShell may misinterpret them as ANSI-encoded, leading to garbled text. It is recommended to explicitly use UTF-8 encoding with a BOM for scripts containing non-ASCII characters. Furthermore, BOM-less files are more compatible with Unix tools, while files with a BOM may cause tools to treat the BOM as part of the data.
Summary and Best Practices
By appropriately configuring $PSDefaultParameterValues and user profiles, you can efficiently set PowerShell's default output encoding to UTF-8. For cross-platform development, prefer PowerShell Core for consistent UTF-8 behavior. In critical scripts, always specify encoding parameters and use local variables to avoid global impacts, ensuring compatibility and maintainability.