Comprehensive Guide to Pretty-Printing XML from Command Line

Nov 19, 2025 · Programming · 9 views · 7.8

Keywords: XML Formatting | Command Line Tools | xmllint | XMLStarlet | xml_pp | Tidy | Python XML Processing

Abstract: This technical paper provides an in-depth analysis of various command-line tools for formatting XML documents in Unix/Linux environments. Through comparative examination of xmllint, XMLStarlet, xml_pp, Tidy, Python xml.dom.minidom, saxon-lint, saxon-HE, and xidel, the article offers comprehensive solutions for XML beautification. Detailed coverage includes installation methods, basic syntax, parameter configuration, and practical examples, enabling developers and system administrators to select the most appropriate XML formatting tools based on specific requirements.

Importance and Background of XML Formatting

XML (eXtensible Markup Language), as a widely used data exchange format, requires good readability for development debugging and document maintenance. However, in practical applications, XML documents often lose their original structure due to compression or automatic generation, making them difficult to read and understand. The need for formatting becomes particularly urgent when processing XML data in command-line environments.

Detailed Analysis of Core Formatting Tools

xmllint Utility

xmllint is a command-line XML processing tool based on the libxml2 library, offering powerful formatting and validation capabilities. Its basic formatting syntax is straightforward:

xmllint --format file.xml

This command automatically adds appropriate indentation to XML elements, using two spaces as the default indentation unit. On Debian-based distributions, this tool can be obtained by installing the libxml2-utils package.

Users can customize indentation characters by setting the XMLLINT_INDENT environment variable:

XMLLINT_INDENT=" " ; xmllint --format emails.xml

This command uses four spaces for indentation, providing better readability. Notably, xmllint automatically adds XML declarations to the output, even if they are absent from the original document.

XMLStarlet Toolkit

XMLStarlet is a comprehensive command-line XML toolkit, with formatting functionality implemented through the format command (abbreviated as fo):

xmlstarlet format --indent-tab file.xml

This tool supports various output control options, including indentation styles and XML declaration handling. For example, to use tab indentation and omit XML declarations:

xml fo -o -s 8 emails.xml

Here, the -o option omits XML declarations, while -s 8 specifies eight-space indentation. XMLStarlet also automatically adds XML declarations when necessary.

xml_pp Pretty-Printer

xml_pp is a dedicated XML beautification tool provided by the Perl module XML::Twig, as indicated by its name:

xml_pp < file.xml

Unlike previous tools, xml_pp does not automatically add XML declarations by default, preserving the integrity of original documents. On Debian-based systems, this tool can be obtained by installing the xml-twig-tools package. Although xml_pp does not support custom indentation settings, its output format is stable and reliable, suitable for scenarios requiring document authenticity.

Tidy Multi-Format Support

Tidy was originally designed as an HTML formatting and repair tool, but its XML support is equally excellent:

tidy -xml -i -q file.xml

Here, the -xml option specifies XML document processing, -i enables indentation, and -q indicates quiet mode to reduce redundant output. Tidy excels at handling poorly formatted XML documents, automatically correcting common formatting errors.

Python Built-in Module Solution

For Python developers, the built-in xml.dom.minidom module can be used for XML formatting:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | python -c 'import sys, xml.dom.minidom; print(xml.dom.minidom.parseString(sys.stdin.read()).toprettyxml())'

This approach is particularly suitable for integrating XML processing functionality in Python script environments, providing programming-level control capabilities.

Advanced XSLT Processing Tools

saxon-lint and saxon-HE are advanced XML processing tools based on XSLT, supporting complex transformations and query operations:

saxon-lint --indent --xpath '/' file.xml

And using saxon-HE's Java implementation:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | java -cp /usr/share/java/saxon/saxon9he.jar net.sf.saxon.Query -s:- -qs:/ '!indent=yes'

These tools are suitable for complex scenarios requiring combined XPath queries and formatting.

xidel Data Extraction Tool

xidel is a versatile web data extraction tool that also supports XML formatting:

xidel --output-node-format=xml --output-node-indent -se . -s file.xml

This tool is particularly useful when processing XML data obtained from the web, enabling simultaneous data extraction and formatting.

Tool Selection and Performance Considerations

When selecting XML formatting tools, multiple factors must be considered: installation convenience, processing performance, output control precision, and integration capabilities with other tools. For simple formatting needs, xmllint and xml_pp provide the most direct solutions; when more control options are needed, XMLStarlet and Tidy are better choices; and in programming environments, the Python solution offers maximum flexibility.

Practical Application Examples

Taking a compressed XML fragment as an example:

<root><foo a="b">lorem</foo><bar value="ipsum" /></root>

After formatting with any of the aforementioned tools, structurally clear XML will be output:

<root> <foo a="b">lorem</foo> <bar value="ipsum"/> </root>

This formatted output significantly improves the readability and maintainability of XML documents.

Conclusion and Best Practices

Command-line XML formatting is a fundamental skill that every system administrator and developer should master. By properly selecting and using appropriate tools, efficiency in XML-related work can be significantly improved. It is recommended to master 2-3 different formatting tools based on specific work environments and requirements to handle various complex scenarios. Additionally, integrating XML formatting functionality into automation scripts ensures that output XML data always maintains good readability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.