What is CDATA in XML and what are its use cases and limitations? - 面试题

CDATA (Character Data) sections in XML are a special mechanism for containing text content that will not be parsed by the XML parser. CDATA sections are very useful when you need to include special characters (such as <, >, &, etc.) or code snippets in XML documents.

Basic Syntax of CDATA

CDATA sections start with <![CDATA[ and end with ]]>:

xml
<description>
    <![CDATA[
        You can include any characters here, including < > & and other special characters
        These characters will not be parsed by the XML parser
    ]]>
</description>

Use Cases for CDATA

1. Including Code Snippets

xml
<code>
    <![CDATA[
        function hello() {
            if (x < 10) {
                return "Hello";
            }
        }
    ]]>
</code>

2. Including Mathematical Formulas

xml
<formula>
    <![CDATA[
        E = mc²
        x < y && y > z
    ]]>
</formula>

3. Including HTML or XML Fragments

xml
<content>
    <![CDATA[
        <div class="header">
            <p>Welcome to <strong>XML</strong></p>
        </div>
    ]]>
</content>

4. Including Special Character Data

xml
<data>
    <![CDATA[
        Special characters: < > & " '
        Comparison: 5 < 10, 20 > 15
    ]]>
</data>

Limitations and Considerations of CDATA

Cannot be nested: CDATA sections cannot be nested

xml
<!-- Error: CDATA cannot be nested -->
<data>
    <![CDATA[
        Outer CDATA
        <![CDATA[Inner CDATA]]>
    ]]>
</data>

Cannot contain end markers: CDATA sections cannot contain the ]]> string

xml
<!-- Error: Contains end marker -->
<data>
    <![CDATA[
        This contains ]]> which is not allowed
    ]]>
</data>

Case sensitive: CDATA markers must be uppercase

xml
<!-- Error: CDATA must be uppercase -->
<data>
    <![cdata[This is wrong]]>
</data>

Whitespace preserved: All whitespace characters within CDATA sections are preserved

xml
<data>
    <![CDATA[
        Line 1
        Line 2
            Indented line
    ]]>
</data>

Comparison of CDATA and Entity References

Feature	CDATA	Entity References
Syntax	`<![CDATA[content]]>`	`<` `>` `&` etc.
Readability	High, displays original content directly	Low, requires conversion
Applicability	Large blocks of text	Individual characters
Performance	Slightly better, reduces parsing overhead	Slightly worse, requires entity parsing
Flexibility	Low, cannot be used partially	High, precise control possible

When to Use CDATA

Situations suitable for CDATA:

Text containing many special characters
Code snippets that need to preserve original formatting
Content containing other markup languages (HTML, JavaScript, etc.)
Need to avoid frequent character escaping

Situations not suitable for CDATA:

Only a few special characters
Need to process content partially
Content may contain the ]]> string
Need compatibility with other XML processing tools

Practical Application Examples of CDATA

1. Web Service Configuration

xml
<configuration>
    <script>
        <![CDATA[
            $(document).ready(function() {
                $("#button").click(function() {
                    if (count < 10) {
                        alert("Click count: " + count);
                    }
                });
            });
        ]]>
    </script>
</configuration>

2. Database Query Storage

xml
<queries>
    <query id="getUser">
        <![CDATA[
            SELECT * FROM users 
            WHERE age > 18 AND status = 'active'
            ORDER BY name ASC
        ]]>
    </query>
</queries>

3. Template Content

xml
<template>
    <![CDATA[
        <html>
        <head><title>${title}</title></head>
        <body>
            <h1>Welcome, ${username}!</h1>
            <p>Your balance is: $${balance}</p>
        </body>
        </html>
    ]]>
</template>

CDATA Processing in Different Languages

Java DOM Parsing

java
Element element = document.createElement("description");
CDATASection cdata = document.createCDATASection("Text with <special> characters");
element.appendChild(cdata);

Python ElementTree

python
import xml.etree.ElementTree as ET

element = ET.Element("description")
element.text = "Text with <special> characters"
# ElementTree will automatically escape special characters

CDATA sections are an important tool in XML for handling special characters and raw text content. Proper use can improve the readability and maintainability of XML documents.