乐闻世界logo
搜索文章和话题

What is the validity and well-formedness of XML documents, and what are the differences between them?

2月21日 14:22

The validity and well-formedness of XML documents are two important concepts that define the quality standards of XML documents.

Well-formed

Well-formed means that an XML document complies with XML syntax rules and can be correctly parsed by an XML parser.

Requirements for Well-formedness

  1. Must have a single root element

    xml
    <!-- Correct: has single root element --> <root> <child>content</child> </root> <!-- Incorrect: no single root element --> <child1>content</child1> <child2>content</child2>
  2. All tags must be properly closed

    xml
    <!-- Correct: all tags are closed --> <root> <child>content</child> </root> <!-- Incorrect: tags not closed --> <root> <child>content </root>
  3. Tags must be properly nested

    xml
    <!-- Correct: properly nested --> <root> <parent> <child>content</child> </parent> </root> <!-- Incorrect: improperly nested --> <root> <parent> <child>content </parent> </child> </root>
  4. Attribute values must be enclosed in quotes

    xml
    <!-- Correct: attribute values in quotes --> <book id="1" category="web">content</book> <!-- Incorrect: attribute values not in quotes --> <book id=1 category=web>content</book>
  5. Tag names are case-sensitive

    xml
    <!-- Correct: consistent case --> <root> <Child>content</Child> </root> <!-- Incorrect: inconsistent case --> <root> <child>content</Child> </root>
  6. Special characters must be escaped

    xml
    <!-- Correct: special characters escaped --> <data>5 &lt; 10 &amp; 20 &gt; 15</data> <!-- Incorrect: special characters not escaped --> <data>5 < 10 & 20 > 15</data>
  7. Empty elements must be properly represented

    xml
    <!-- Correct: empty element representation --> <empty/> <empty></empty> <!-- Incorrect: improper empty element representation --> <empty>

Valid

Valid means that an XML document is not only well-formed but also conforms to a specific Document Type Definition (DTD) or XML Schema.

Requirements for Validity

  1. Must be well-formed

    • A valid document must first be well-formed
  2. Must conform to DTD or Schema

    xml
    <!-- XML document --> <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>John</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget the meeting!</body> </note> <!-- DTD file (note.dtd) --> <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>
  3. Elements and attributes must conform to definitions

    xml
    <!-- XML Schema --> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema> <!-- Valid XML document --> <book id="1"> <title>XML Guide</title> <author>John Doe</author> <price>39.95</price> </book> <!-- Invalid XML document (missing required attribute) --> <book> <title>XML Guide</title> <author>John Doe</author> <price>39.95</price> </book>
  4. Data types must match

    xml
    <!-- Correct: data types match --> <price>39.95</price> <!-- Incorrect: data types don't match --> <price>thirty-nine point ninety-five</price>

Comparison of Well-formed vs Valid

FeatureWell-formedValid
Syntax requirementsMust comply with XML syntax rulesMust comply with XML syntax rules
Structure requirementsMust have single root elementMust have single root element
Validation requirementsNo DTD or Schema neededRequires DTD or Schema
Constraint checkingNo constraint checkingChecks all constraints
Data typesNo data type checkingChecks data types
Required elementsNo required element checkingChecks required elements
Attribute constraintsNo attribute constraint checkingChecks attribute constraints

Validating XML Documents

1. Validating Well-formedness

Java Example:

java
import javax.xml.parsers.*; import org.xml.sax.*; import org.w3c.dom.*; public class XMLValidator { public static boolean isWellFormed(String xml) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not well-formed: " + e.getMessage()); return false; } } }

2. Validating Validity

Validating with DTD:

java
public class XMLValidator { public static boolean isValidWithDTD(String xml, String dtd) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new ErrorHandler() { public void warning(SAXParseException e) { System.err.println("Warning: " + e.getMessage()); } public void error(SAXParseException e) { System.err.println("Error: " + e.getMessage()); } public void fatalError(SAXParseException e) { System.err.println("Fatal Error: " + e.getMessage()); } }); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not valid: " + e.getMessage()); return false; } } }

Validating with Schema:

java
public class XMLValidator { public static boolean isValidWithSchema(String xml, String schemaPath) { try { SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); Schema schema = factory.newSchema(new File(schemaPath)); DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); docFactory.setSchema(schema); DocumentBuilder builder = docFactory.newDocumentBuilder(); builder.setErrorHandler(new ErrorHandler() { public void warning(SAXParseException e) { System.err.println("Warning: " + e.getMessage()); } public void error(SAXParseException e) { System.err.println("Error: " + e.getMessage()); } public void fatalError(SAXParseException e) { System.err.println("Fatal Error: " + e.getMessage()); } }); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not valid: " + e.getMessage()); return false; } } }

Best Practices

  1. Always ensure well-formedness: Well-formedness is a basic requirement for XML documents
  2. Use Schema validation: Schema is more powerful and flexible than DTD
  3. Enable validation during development: Enable validation during development, adjust as needed in production
  4. Handle validation errors: Provide clear error messages to help with debugging
  5. Use standard tools: Use mature XML parsers and validation tools
  6. Document constraints: Clearly document data structures and constraints
  7. Version control: Version control Schemas and DTDs

Well-formedness and validity are two important dimensions of XML document quality. Well-formedness ensures that the document can be parsed, while validity ensures that the document conforms to business rules and data structure requirements. In practical applications, you should choose the appropriate validation level based on your needs.

标签:XML