乐闻世界logo
搜索文章和话题

What is XPath and how do you use it to query data in XML?

2月21日 14:22

XPath (XML Path Language) is a language used to locate and select nodes in XML documents. It provides a concise and powerful way to query data in XML documents, similar to the role of SQL in relational databases.

Basic Concepts of XPath

Node Types

XPath treats XML documents as node trees, containing the following node types:

  1. Element nodes: XML elements
  2. Attribute nodes: Attributes of elements
  3. Text nodes: Text content within elements or attributes
  4. Namespace nodes: Namespaces of elements
  5. Processing instruction nodes: XML processing instructions
  6. Comment nodes: XML comments
  7. Document nodes: The root node of the entire document

XPath Syntax

1. Basic Path Expressions

xml
<!-- Example XML --> <bookstore> <book category="web"> <title lang="en">XML Guide</title> <author>John Doe</author> <price>39.95</price> </book> <book category="database"> <title lang="en">SQL Basics</title> <author>Jane Smith</author> <price>29.99</price> </book> </bookstore>
xpath
/* Select document node (root node) /bookstore Select root element bookstore /bookstore/book Select all book elements under bookstore //book Select all book elements in the document bookstore//book Select all book elements in the descendants of bookstore

2. Predicates

Predicates are used to find specific nodes and are placed in square brackets []:

xpath
/bookstore/book[1] Select the first book element /bookstore/book[last()] Select the last book element /bookstore/book[position()<3] Select the first two book elements //book[@category='web'] Select book elements with category attribute 'web' //book[price>35] Select book elements with price greater than 35

3. Wildcards

xpath
* Match any element node @* Match any attribute node node() Match any type of node //book/* Select all child elements of book elements

4. Axes

Axes define a set of nodes relative to the current node:

xpath
ancestor Select all ancestor nodes of the current node ancestor-or-self Select the current node and all its ancestor nodes attribute Select all attribute nodes of the current node child Select all child nodes of the current node descendant Select all descendant nodes of the current node descendant-or-self Select the current node and all its descendant nodes following Select all nodes after the current node in the document following-sibling Select all sibling nodes after the current node namespace Select all namespace nodes of the current node parent Select the parent node of the current node preceding Select all nodes before the current node in the document preceding-sibling Select all sibling nodes before the current node self Select the current node itself

5. Operators

xpath
Arithmetic operators: + - * div mod Comparison operators: = != < > <= >= Boolean operators: and or not()

XPath Functions

Node Set Functions

xpath
count(//book) Count the number of book elements id('b1') Select the element with ID 'b1' local-name() Return the local name of the node namespace-uri() Return the namespace URI of the node name() Return the name of the node

String Functions

xpath
string() Convert node to string concat('Hello', ' ', 'World') Concatenate strings starts-with(text, 'XML') Check if it starts with the specified string contains(text, 'XML') Check if it contains the specified string substring(text, 1, 3) Extract substring string-length(text) Return string length normalize-space(text) Normalize whitespace characters translate(text, 'abc', 'XYZ') Character replacement

Boolean Functions

xpath
boolean() Convert to boolean not() Logical NOT true() Return true false() Return false lang() Check language settings

Number Functions

xpath
number() Convert to number sum(//price) Calculate sum floor(3.7) Round down ceiling(3.2) Round up round(3.7) Round to nearest integer

Practical Application Examples of XPath

1. Using XPath in Java

java
import javax.xml.xpath.*; import org.w3c.dom.*; XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); // Parse XML document DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = docFactory.newDocumentBuilder(); Document doc = builder.parse(new File("books.xml")); // Execute XPath query String expression = "//book[@category='web']/title/text()"; String title = xpath.evaluate(expression, doc, XPathConstants.STRING); System.out.println("Title: " + title); // Get node list NodeList books = (NodeList) xpath.evaluate("//book", doc, XPathConstants.NODESET); for (int i = 0; i < books.getLength(); i++) { Element book = (Element) books.item(i); System.out.println(book.getAttribute("category")); }

2. Using XPath in Python (lxml)

python
from lxml import etree # Parse XML document tree = etree.parse("books.xml") # Execute XPath query titles = tree.xpath("//book[@category='web']/title/text()") for title in titles: print(f"Title: {title}") # Get attributes categories = tree.xpath("//book/@category") for category in categories: print(f"Category: {category}") # Use functions total_price = sum(tree.xpath("//book/price/text()")) print(f"Total price: {total_price}")

3. Using XPath in JavaScript

javascript
// Parse XML const parser = new DOMParser(); const xmlDoc = parser.parseFromString(xmlString, "text/xml"); // Execute XPath query const result = xmlDoc.evaluate( "//book[@category='web']/title", xmlDoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null ); for (let i = 0; i < result.snapshotLength; i++) { const node = result.snapshotItem(i); console.log(node.textContent); }

Relationship Between XPath and XQuery

XQuery is a query language built on XPath that extends XPath's capabilities:

  • XPath: Used for locating and selecting nodes
  • XQuery: Used for querying, transforming, and constructing XML data

XPath Best Practices

  1. Use absolute paths: When document structure is fixed, use absolute paths for better performance
  2. Avoid using //: // searches the entire document, affecting performance
  3. Use predicates for filtering: Filter nodes early to reduce the amount of data processed
  4. Utilize indexes: For large documents, consider using indexes to optimize queries
  5. Cache query results: For frequently executed queries, cache results

XPath is a powerful tool for processing XML data. Mastering XPath can greatly improve the efficiency and flexibility of XML data processing.

标签:XML