Keywords: Java | XML Pretty Printing | DOMImplementationLS | Transformer | Apache XML Serializer | Groovy XmlUtil
Abstract: This article comprehensively explores various technical solutions for pretty printing XML strings in Java, with a focus on modern implementations based on DOMImplementationLS, while comparing traditional approaches like Transformer and Apache XML Serializer. Through complete code examples, it demonstrates how to convert unformatted XML strings into well-indented and line-broken formatted outputs, covering exception handling, performance considerations, and best practices.
Technical Background of XML Pretty Printing
In software development, XML as a widely used data exchange format often needs to be presented to users or developers in a readable form. Raw XML strings typically lack proper indentation and line breaks, making reading and understanding difficult. Java provides multiple built-in APIs for XML pretty printing, each with distinct characteristics and applicable scenarios.
Modern Implementation Based on DOMImplementationLS
Java 6 and later versions introduced the DOM Level 3 Load and Save specification, providing a more concise and standardized approach to XML processing. Below is the core implementation code based on DOMImplementationLS:
import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;
public class XmlFormatter {
public String format(String xml) {
try {
final InputSource src = new InputSource(new StringReader(xml));
final Node document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(src).getDocumentElement();
final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));
final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
final LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
return writer.writeToString(document);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
The main advantage of this method lies in its standardization and simplicity. By setting the format-pretty-print parameter to true, the system automatically handles indentation and line break formatting. Note that in some Java implementations, system properties may need to be set to ensure proper initialization of DOMImplementationRegistry.
Traditional Transformer-Based Approach
Before DOMImplementationLS emerged, Transformer was the primary tool for XML formatting in Java. Below is an implementation example using Transformer:
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.StringReader;
import java.io.StringWriter;
public class TransformerXmlFormatter {
public static String prettyFormat(String input, int indent) {
try {
StreamSource xmlInput = new StreamSource(new StringReader(input));
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount",
String.valueOf(indent));
transformer.transform(xmlInput, xmlOutput);
return xmlOutput.getWriter().toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
This method requires explicit setting of indentation properties, and its behavior may vary across different Java versions. Particularly, support for the indent-number attribute differs among implementations.
Apache XML Serializer Solution
For scenarios requiring finer control, the Apache XML project provides specialized serialization tools:
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
public class ApacheXmlFormatter {
public String format(String unformattedXml) {
try {
Document document = parseXmlFile(unformattedXml);
OutputFormat format = new OutputFormat(document);
format.setLineWidth(65);
format.setIndenting(true);
format.setIndent(2);
StringWriter out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
return out.toString();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private Document parseXmlFile(String xml) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
return db.parse(is);
} catch (ParserConfigurationException | SAXException | IOException e) {
throw new RuntimeException(e);
}
}
}
Simplified Approach in Groovy
For developers using Groovy, XML processing becomes more concise. Groovy provides the XmlUtil class to simplify XML serialization operations:
import groovy.xml.XmlUtil
def unformattedXml = '<languages><language id="1">Groovy</language></languages>'
def formattedXml = XmlUtil.serialize(unformattedXml)
println formattedXml
Groovy's XmlUtil.serialize() method supports multiple parameter types, including String, GPathResult, and Node, offering great flexibility.
Performance and Compatibility Considerations
When selecting an XML pretty printing solution, consider the following factors:
Performance Impact: All solutions require parsing XML strings into DOM trees and then re-serializing them. For large XML documents, this can impose significant memory and CPU overhead.
Java Version Compatibility: DOMImplementationLS requires Java 6+, while the Transformer approach is available in earlier versions. Apache XML Serializer requires additional dependencies.
Output Consistency: Different solutions may have subtle differences in whitespace handling, attribute quote usage, etc.
Best Practice Recommendations
Based on practical project experience, the following best practices are recommended:
1. Exception Handling: In production environments, more detailed exception handling mechanisms should be used instead of simply wrapping as RuntimeException.
2. Memory Management: For large XML documents, consider using streaming or chunked processing to avoid memory overflow.
3. Encoding Handling: Ensure proper handling of encoding information in XML declarations to avoid character set issues.
4. Security: When processing untrusted input, configure appropriate XML parsing security options to prevent XXE attacks.
Practical Application Example
Below is a complete test example demonstrating the effects of different formatting methods:
public class XmlFormattingDemo {
public static void main(String[] args) {
String testXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<root><child>content</child><empty/></root>";
// Using DOMImplementationLS method
XmlFormatter formatter = new XmlFormatter();
String result1 = formatter.format(testXml);
System.out.println("DOMImplementationLS result:");
System.out.println(result1);
// Using Transformer method
String result2 = TransformerXmlFormatter.prettyFormat(testXml, 2);
System.out.println("Transformer result:");
System.out.println(result2);
}
}
By comparing the output results of different methods, developers can choose the most suitable XML pretty printing solution based on specific requirements.