XML parsing is the process of converting an XML document into data structures that applications can process. There are two main parsing methods: DOM (Document Object Model) and SAX (Simple API for XML).
DOM Parsing
DOM is a tree-based parsing method that loads the entire XML document into memory and builds a tree structure.
Characteristics of DOM Parsing
- High memory usage: Requires loading the entire document into memory
- Random access: Can randomly access any part of the document
- Bidirectional traversal: Can traverse the document forward and backward
- Modification capability: Can modify document structure and content
- Suitable for small documents: Suitable for processing smaller XML documents
DOM Parsing Example (Java)
javaDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new File("data.xml")); // Get root element Element root = document.getDocumentElement(); // Get all book elements NodeList books = root.getElementsByTagName("book"); for (int i = 0; i < books.getLength(); i++) { Element book = (Element) books.item(i); String title = book.getElementsByTagName("title") .item(0) .getTextContent(); System.out.println("Title: " + title); }
SAX Parsing
SAX is an event-based parsing method that reads the XML document line by line and triggers events when encountering specific elements.
Characteristics of SAX Parsing
- Low memory usage: Does not need to load the entire document into memory
- Sequential access: Can only access the document sequentially
- Unidirectional traversal: Can only traverse forward
- Read-only mode: Cannot modify the document
- Suitable for large documents: Suitable for processing large XML documents
SAX Parsing Example (Java)
javaSAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); DefaultHandler handler = new DefaultHandler() { boolean inTitle = false; public void startElement(String uri, String localName, String qName, Attributes attributes) { if (qName.equals("title")) { inTitle = true; } } public void characters(char[] ch, int start, int length) { if (inTitle) { System.out.println("Title: " + new String(ch, start, length)); } } public void endElement(String uri, String localName, String qName) { if (qName.equals("title")) { inTitle = false; } } }; saxParser.parse(new File("data.xml"), handler);
Comparison of DOM and SAX
| Feature | DOM | SAX |
|---|---|---|
| Memory usage | High | Low |
| Access method | Random access | Sequential access |
| Traversal direction | Bidirectional | Unidirectional |
| Modification capability | Modifiable | Read-only |
| Parsing speed | Slower | Faster |
| Suitable scenarios | Small documents, need modification | Large documents, read-only |
Other Parsing Methods
1. StAX (Streaming API for XML)
StAX is a pull-based parsing method that combines the advantages of DOM and SAX.
javaXMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createXMLStreamReader(new FileInputStream("data.xml")); while (reader.hasNext()) { int event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT && reader.getLocalName().equals("title")) { System.out.println("Title: " + reader.getElementText()); } }
2. JAXB (Java Architecture for XML Binding)
JAXB provides automatic binding between XML and Java objects.
javaJAXBContext context = JAXBContext.newInstance(Book.class); Unmarshaller unmarshaller = context.createUnmarshaller(); Book book = (Book) unmarshaller.unmarshal(new File("book.xml"));
Recommendations for Choosing Parsing Methods
- Choose DOM: When you need random access, document modification, and the document is small
- Choose SAX: When processing large documents and only need sequential reading
- Choose StAX: When you need better performance and more flexible control
- Choose JAXB: When you need to convert between XML and object models
Performance Optimization Recommendations
- Use appropriate parsers: Choose the right parsing method based on document size and requirements
- Enable validation: Enable Schema validation during development, disable in production for better performance
- Cache parsing results: Cache parsing results for frequently accessed documents
- Use streaming processing: Use SAX or StAX for streaming processing of large documents
XML parsing is a core technology for processing XML data. Choosing the right parsing method can significantly improve application performance and maintainability.