Keywords: XSLT | Non-breaking Space | Character Entities | XML Parsing | Numeric Character Reference
Abstract: This article provides an in-depth analysis of the XML parsing errors encountered when inserting non-breaking space characters in XSLT stylesheets. By examining the differences between HTML character entity references and XML predefined entities, it proposes using the numeric character reference   as the standard solution. The paper also discusses technical details such as character encoding and output method settings, with complete code examples and practical guidance.
Problem Background and Error Analysis
During XSLT stylesheet development, developers often need to insert non-breaking space characters to maintain text formatting. However, directly using the common HTML entity reference results in XML parsing errors: XML Parsing Error: undefined entity. The root cause of this issue lies in the fundamental differences in character entity handling mechanisms between XML and HTML.
Differences Between XML and HTML Character Entities
The XML specification predefines only five basic character entities: &, <, >, ', and ". In contrast, the HTML standard defines over 200 character entity references, including . When XSLT processors parse stylesheets, they follow XML parsing rules and cannot recognize HTML-specific entity references.
Standard Solution: Using Numeric Character References
The most reliable solution is to use the Unicode numeric character reference  . This reference directly corresponds to the Unicode code point (U+00A0) of the non-breaking space character and can be correctly recognized by all XML-compliant parsers.
<xsl:template match="example">
<span>Text start<xsl:text> </xsl:text>Text end</span>
</xsl:template>
Character Encoding and Output Settings
Proper character encoding settings are crucial for ensuring correct display of non-breaking spaces. It is recommended to explicitly specify output encoding in XSLT stylesheets:
<xsl:output method="html" encoding="UTF-8" version="1.0" indent="yes"/>
UTF-8 encoding fully supports the Unicode character set, including non-breaking spaces. Additionally, ensure that both source XML files and XSLT files are saved with consistent character encoding.
Alternative Approaches and Technical Considerations
While it is possible to achieve similar functionality by defining custom entities or using the disable-output-escaping attribute, these methods pose compatibility risks and maintenance complexities. The numeric character reference approach offers the best cross-platform compatibility and standards compliance.
Practical Recommendations and Best Practices
In actual development, it is recommended to always use   as the standard representation for non-breaking spaces. Additionally, conduct comprehensive testing to verify display effects across different browsers and XSLT processors, ensuring that the final HTML output correctly presents the intended formatting.