Writing UTF-8 Files Without BOM in PowerShell: Methods and Implementation

Nov 19, 2025 · Programming · 14 views · 7.8

Keywords: PowerShell | UTF-8 Encoding | Byte Order Mark | File Processing | .NET Framework

Abstract: This technical paper comprehensively examines methods for writing UTF-8 encoded files without Byte Order Mark (BOM) in PowerShell. By analyzing the encoding limitations of the Out-File command, it focuses on the core technique of using .NET Framework's UTF8Encoding class and WriteAllLines method for BOM-free writing. The paper compares multiple alternative approaches, including the New-Item command and custom Out-FileUtf8NoBom function, and discusses encoding differences between PowerShell versions (Windows PowerShell vs. PowerShell Core). Complete code examples and performance optimization recommendations are provided to help developers choose the most suitable implementation based on specific requirements.

PowerShell Encoding Fundamentals

In file encoding processing, the Byte Order Mark (BOM) is a crucial concept. BOM is a special character sequence at the beginning of a text file that identifies the file's byte order and encoding format. For UTF-8 encoding, the BOM consists of three bytes EF BB BF. While BOM can help identify file encoding in certain scenarios, BOM-less UTF-8 files are more universal and compatible in many modern applications and cross-platform environments.

Encoding Limitations of Out-File Command

PowerShell's Out-File command is a commonly used file output tool, but it has specific limitations in encoding handling. When using the -Encoding "UTF8" parameter, Out-File forcibly adds a BOM marker at the beginning of the file. This behavior is particularly evident in Windows PowerShell and may cause compatibility issues with other systems or applications.

The following code example demonstrates the typical usage of Out-File and its BOM-related issues:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath

Core Solution: Using .NET Framework for BOM-Free Writing

To solve the BOM issue, the most direct and effective method is to bypass PowerShell's built-in commands and directly utilize the .NET Framework's encoding capabilities. The .NET System.Text.UTF8Encoding class provides fine-grained encoding control.

Key Features of UTF8Encoding Class

The UTF8Encoding class constructor accepts a boolean parameter that controls whether to include BOM in the encoded output. When this parameter is set to $False, the generated encoder will not add a BOM marker.

The complete implementation code is as follows:

$MyRawString = Get-Content -Raw $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)

Code Analysis and Technical Details

The above solution involves three key steps:

  1. File Content Reading: Use Get-Content -Raw command to read file content as a single string, preserving original formatting and line breaks.
  2. Encoder Creation: Create a BOM-free UTF-8 encoder instance via New-Object System.Text.UTF8Encoding $False.
  3. File Writing: Call the [System.IO.File]::WriteAllLines static method to write content to the file using the specified encoder.

The core advantage of this method lies in directly leveraging the .NET Framework's underlying encoding functionality, ensuring precise control over encoding behavior.

Simplified Implementation Approach

In practical applications, the code implementation can be further simplified. Starting from .NET Framework 4.5, the WriteAllLines method defaults to using BOM-free UTF-8 encoding when no encoder is specified.

The simplified version code is as follows:

[IO.File]::WriteAllLines($filename, $content)

This simplified approach works correctly in most modern environments and offers more concise and readable code.

Alternative Approaches Comparison

Beyond the core .NET solution, several other methods exist for implementing BOM-free UTF-8 writing, each with its applicable scenarios and limitations.

New-Item Command Approach

PowerShell's New-Item command provides an interesting alternative. In Windows PowerShell, this command defaults to creating BOM-free UTF-8 files, contrasting sharply with Out-File's behavior.

Implementation code:

$null = New-Item -Force $MyPath -Value (Get-Content -Raw $MyPath)

Note that this method does not automatically add trailing newlines, differing slightly from Out-File's behavior.

ASCII Encoding Workaround

In some simple scenarios, ASCII encoding can be used as a workaround:

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

This method is suitable for pure ASCII character content but will produce encoding errors for text containing non-ASCII characters.

Advanced Custom Function Implementation

For scenarios requiring frequent BOM-free UTF-8 writing, custom functions can be created to encapsulate related functionality. Below is a fully functional Out-FileUtf8NoBom function implementation:

function Out-FileUtf8NoBom {
    [CmdletBinding(PositionalBinding=$false)]
    param(
        [Parameter(Mandatory, Position = 0)] [string] $LiteralPath,
        [switch] $Append,
        [switch] $NoClobber,
        [AllowNull()] [int] $Width,
        [switch] $UseLF,
        [Parameter(ValueFromPipeline)] $InputObject
    )
    
    begin {
        $dir = Split-Path -LiteralPath $LiteralPath
        if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath }
        $LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
        
        if ($NoClobber -and (Test-Path $LiteralPath)) {
            Throw [IO.IOException] "The file '$LiteralPath' already exists."
        }
        
        $sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
        $htOutStringArgs = @{}
        if ($Width) { $htOutStringArgs += @{ Width = $Width } }

        try { 
            $scriptCmd = { 
                & Microsoft.PowerShell.Utility\Out-String -Stream @htOutStringArgs | 
                  . { process { if ($UseLF) { $sw.Write(($_ + "`n")) } else { $sw.WriteLine($_) } } }
            }  
            
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        }
        catch { throw }
    }

    process {
        $steppablePipeline.Process($_)
    }

    end {
        $steppablePipeline.End()
        $sw.Dispose()
    }
}

This function provides an interface similar to Out-File, supporting pipeline input, append mode, no-clobber protection, and other advanced features while ensuring BOM-free UTF-8 output.

PowerShell Version Differences and Compatibility

Different PowerShell versions exhibit significant differences in encoding handling, which importantly affects code compatibility.

Windows PowerShell vs. PowerShell Core

In Windows PowerShell (version 5.1 and below), UTF-8 encoding includes BOM by default, reflecting historical design choices. In PowerShell Core (version 6.0 and above), BOM-free UTF-8 becomes the default encoding, reflecting modern software development's need for cross-platform compatibility.

PowerShell Core also offers richer encoding parameters:

Performance Optimization and Best Practices

When handling large files or performance-critical scenarios, memory usage and performance optimization must be considered.

Memory Usage Considerations

WriteAllLines-based solutions require loading the entire file content into memory, which may cause performance issues with large files. The custom Out-FileUtf8NoBom function employs streaming processing and can better handle large file scenarios.

In-Place File Update Considerations

When reading from and writing to the same file, execution order must be carefully considered:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath

Using parentheses ensures file content is completely read before the writing process begins, avoiding read-write conflicts.

Practical Application Scenarios and Selection Recommendations

Based on different application requirements, the most suitable implementation can be selected:

By deeply understanding PowerShell's encoding mechanisms and the .NET Framework's underlying implementation, developers can flexibly choose the most appropriate BOM-free UTF-8 file writing solution for specific scenarios, ensuring code reliability, performance, and compatibility.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.