XPath (XML Path Language) is a language used to locate and select nodes in XML documents. It provides a concise and powerful way to query data in XML documents, similar to the role of SQL in relational databases.
Basic Concepts of XPath
Node Types
XPath treats XML documents as node trees, containing the following node types:
- Element nodes: XML elements
- Attribute nodes: Attributes of elements
- Text nodes: Text content within elements or attributes
- Namespace nodes: Namespaces of elements
- Processing instruction nodes: XML processing instructions
- Comment nodes: XML comments
- Document nodes: The root node of the entire document
XPath Syntax
1. Basic Path Expressions
xml<!-- Example XML --> <bookstore> <book category="web"> <title lang="en">XML Guide</title> <author>John Doe</author> <price>39.95</price> </book> <book category="database"> <title lang="en">SQL Basics</title> <author>Jane Smith</author> <price>29.99</price> </book> </bookstore>
xpath/* Select document node (root node) /bookstore Select root element bookstore /bookstore/book Select all book elements under bookstore //book Select all book elements in the document bookstore//book Select all book elements in the descendants of bookstore
2. Predicates
Predicates are used to find specific nodes and are placed in square brackets []:
xpath/bookstore/book[1] Select the first book element /bookstore/book[last()] Select the last book element /bookstore/book[position()<3] Select the first two book elements //book[@category='web'] Select book elements with category attribute 'web' //book[price>35] Select book elements with price greater than 35
3. Wildcards
xpath* Match any element node @* Match any attribute node node() Match any type of node //book/* Select all child elements of book elements
4. Axes
Axes define a set of nodes relative to the current node:
xpathancestor Select all ancestor nodes of the current node ancestor-or-self Select the current node and all its ancestor nodes attribute Select all attribute nodes of the current node child Select all child nodes of the current node descendant Select all descendant nodes of the current node descendant-or-self Select the current node and all its descendant nodes following Select all nodes after the current node in the document following-sibling Select all sibling nodes after the current node namespace Select all namespace nodes of the current node parent Select the parent node of the current node preceding Select all nodes before the current node in the document preceding-sibling Select all sibling nodes before the current node self Select the current node itself
5. Operators
xpathArithmetic operators: + - * div mod Comparison operators: = != < > <= >= Boolean operators: and or not()
XPath Functions
Node Set Functions
xpathcount(//book) Count the number of book elements id('b1') Select the element with ID 'b1' local-name() Return the local name of the node namespace-uri() Return the namespace URI of the node name() Return the name of the node
String Functions
xpathstring() Convert node to string concat('Hello', ' ', 'World') Concatenate strings starts-with(text, 'XML') Check if it starts with the specified string contains(text, 'XML') Check if it contains the specified string substring(text, 1, 3) Extract substring string-length(text) Return string length normalize-space(text) Normalize whitespace characters translate(text, 'abc', 'XYZ') Character replacement
Boolean Functions
xpathboolean() Convert to boolean not() Logical NOT true() Return true false() Return false lang() Check language settings
Number Functions
xpathnumber() Convert to number sum(//price) Calculate sum floor(3.7) Round down ceiling(3.2) Round up round(3.7) Round to nearest integer
Practical Application Examples of XPath
1. Using XPath in Java
javaimport javax.xml.xpath.*; import org.w3c.dom.*; XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); // Parse XML document DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = docFactory.newDocumentBuilder(); Document doc = builder.parse(new File("books.xml")); // Execute XPath query String expression = "//book[@category='web']/title/text()"; String title = xpath.evaluate(expression, doc, XPathConstants.STRING); System.out.println("Title: " + title); // Get node list NodeList books = (NodeList) xpath.evaluate("//book", doc, XPathConstants.NODESET); for (int i = 0; i < books.getLength(); i++) { Element book = (Element) books.item(i); System.out.println(book.getAttribute("category")); }
2. Using XPath in Python (lxml)
pythonfrom lxml import etree # Parse XML document tree = etree.parse("books.xml") # Execute XPath query titles = tree.xpath("//book[@category='web']/title/text()") for title in titles: print(f"Title: {title}") # Get attributes categories = tree.xpath("//book/@category") for category in categories: print(f"Category: {category}") # Use functions total_price = sum(tree.xpath("//book/price/text()")) print(f"Total price: {total_price}")
3. Using XPath in JavaScript
javascript// Parse XML const parser = new DOMParser(); const xmlDoc = parser.parseFromString(xmlString, "text/xml"); // Execute XPath query const result = xmlDoc.evaluate( "//book[@category='web']/title", xmlDoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null ); for (let i = 0; i < result.snapshotLength; i++) { const node = result.snapshotItem(i); console.log(node.textContent); }
Relationship Between XPath and XQuery
XQuery is a query language built on XPath that extends XPath's capabilities:
- XPath: Used for locating and selecting nodes
- XQuery: Used for querying, transforming, and constructing XML data
XPath Best Practices
- Use absolute paths: When document structure is fixed, use absolute paths for better performance
- Avoid using
//://searches the entire document, affecting performance - Use predicates for filtering: Filter nodes early to reduce the amount of data processed
- Utilize indexes: For large documents, consider using indexes to optimize queries
- Cache query results: For frequently executed queries, cache results
XPath is a powerful tool for processing XML data. Mastering XPath can greatly improve the efficiency and flexibility of XML data processing.