The validity and well-formedness of XML documents are two important concepts that define the quality standards of XML documents.
Well-formed
Well-formed means that an XML document complies with XML syntax rules and can be correctly parsed by an XML parser.
Requirements for Well-formedness
-
Must have a single root element
xml<!-- Correct: has single root element --> <root> <child>content</child> </root> <!-- Incorrect: no single root element --> <child1>content</child1> <child2>content</child2> -
All tags must be properly closed
xml<!-- Correct: all tags are closed --> <root> <child>content</child> </root> <!-- Incorrect: tags not closed --> <root> <child>content </root> -
Tags must be properly nested
xml<!-- Correct: properly nested --> <root> <parent> <child>content</child> </parent> </root> <!-- Incorrect: improperly nested --> <root> <parent> <child>content </parent> </child> </root> -
Attribute values must be enclosed in quotes
xml<!-- Correct: attribute values in quotes --> <book id="1" category="web">content</book> <!-- Incorrect: attribute values not in quotes --> <book id=1 category=web>content</book> -
Tag names are case-sensitive
xml<!-- Correct: consistent case --> <root> <Child>content</Child> </root> <!-- Incorrect: inconsistent case --> <root> <child>content</Child> </root> -
Special characters must be escaped
xml<!-- Correct: special characters escaped --> <data>5 < 10 & 20 > 15</data> <!-- Incorrect: special characters not escaped --> <data>5 < 10 & 20 > 15</data> -
Empty elements must be properly represented
xml<!-- Correct: empty element representation --> <empty/> <empty></empty> <!-- Incorrect: improper empty element representation --> <empty>
Valid
Valid means that an XML document is not only well-formed but also conforms to a specific Document Type Definition (DTD) or XML Schema.
Requirements for Validity
-
Must be well-formed
- A valid document must first be well-formed
-
Must conform to DTD or Schema
xml<!-- XML document --> <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>John</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget the meeting!</body> </note> <!-- DTD file (note.dtd) --> <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> -
Elements and attributes must conform to definitions
xml<!-- XML Schema --> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema> <!-- Valid XML document --> <book id="1"> <title>XML Guide</title> <author>John Doe</author> <price>39.95</price> </book> <!-- Invalid XML document (missing required attribute) --> <book> <title>XML Guide</title> <author>John Doe</author> <price>39.95</price> </book> -
Data types must match
xml<!-- Correct: data types match --> <price>39.95</price> <!-- Incorrect: data types don't match --> <price>thirty-nine point ninety-five</price>
Comparison of Well-formed vs Valid
| Feature | Well-formed | Valid |
|---|---|---|
| Syntax requirements | Must comply with XML syntax rules | Must comply with XML syntax rules |
| Structure requirements | Must have single root element | Must have single root element |
| Validation requirements | No DTD or Schema needed | Requires DTD or Schema |
| Constraint checking | No constraint checking | Checks all constraints |
| Data types | No data type checking | Checks data types |
| Required elements | No required element checking | Checks required elements |
| Attribute constraints | No attribute constraint checking | Checks attribute constraints |
Validating XML Documents
1. Validating Well-formedness
Java Example:
javaimport javax.xml.parsers.*; import org.xml.sax.*; import org.w3c.dom.*; public class XMLValidator { public static boolean isWellFormed(String xml) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not well-formed: " + e.getMessage()); return false; } } }
2. Validating Validity
Validating with DTD:
javapublic class XMLValidator { public static boolean isValidWithDTD(String xml, String dtd) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new ErrorHandler() { public void warning(SAXParseException e) { System.err.println("Warning: " + e.getMessage()); } public void error(SAXParseException e) { System.err.println("Error: " + e.getMessage()); } public void fatalError(SAXParseException e) { System.err.println("Fatal Error: " + e.getMessage()); } }); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not valid: " + e.getMessage()); return false; } } }
Validating with Schema:
javapublic class XMLValidator { public static boolean isValidWithSchema(String xml, String schemaPath) { try { SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); Schema schema = factory.newSchema(new File(schemaPath)); DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); docFactory.setSchema(schema); DocumentBuilder builder = docFactory.newDocumentBuilder(); builder.setErrorHandler(new ErrorHandler() { public void warning(SAXParseException e) { System.err.println("Warning: " + e.getMessage()); } public void error(SAXParseException e) { System.err.println("Error: " + e.getMessage()); } public void fatalError(SAXParseException e) { System.err.println("Fatal Error: " + e.getMessage()); } }); Document document = builder.parse(new InputSource(new StringReader(xml))); return true; } catch (Exception e) { System.err.println("XML is not valid: " + e.getMessage()); return false; } } }
Best Practices
- Always ensure well-formedness: Well-formedness is a basic requirement for XML documents
- Use Schema validation: Schema is more powerful and flexible than DTD
- Enable validation during development: Enable validation during development, adjust as needed in production
- Handle validation errors: Provide clear error messages to help with debugging
- Use standard tools: Use mature XML parsers and validation tools
- Document constraints: Clearly document data structures and constraints
- Version control: Version control Schemas and DTDs
Well-formedness and validity are two important dimensions of XML document quality. Well-formedness ensures that the document can be parsed, while validity ensures that the document conforms to business rules and data structure requirements. In practical applications, you should choose the appropriate validation level based on your needs.