What is XPath and how do you use it to query data in XML? - 面试题

XPath (XML Path Language) is a language used to locate and select nodes in XML documents. It provides a concise and powerful way to query data in XML documents, similar to the role of SQL in relational databases.

Basic Concepts of XPath

Node Types

XPath treats XML documents as node trees, containing the following node types:

Element nodes: XML elements
Attribute nodes: Attributes of elements
Text nodes: Text content within elements or attributes
Namespace nodes: Namespaces of elements
Processing instruction nodes: XML processing instructions
Comment nodes: XML comments
Document nodes: The root node of the entire document

XPath Syntax

1. Basic Path Expressions

xml
<!-- Example XML -->
<bookstore>
    <book category="web">
        <title lang="en">XML Guide</title>
        <author>John Doe</author>
        <price>39.95</price>
    </book>
    <book category="database">
        <title lang="en">SQL Basics</title>
        <author>Jane Smith</author>
        <price>29.99</price>
    </book>
</bookstore>

xpath
/* Select document node (root node)
/bookstore Select root element bookstore
/bookstore/book Select all book elements under bookstore
//book Select all book elements in the document
bookstore//book Select all book elements in the descendants of bookstore

2. Predicates

Predicates are used to find specific nodes and are placed in square brackets []:

xpath
/bookstore/book[1] Select the first book element
/bookstore/book[last()] Select the last book element
/bookstore/book[position()<3] Select the first two book elements
//book[@category='web'] Select book elements with category attribute 'web'
//book[price>35] Select book elements with price greater than 35

3. Wildcards

xpath
* Match any element node
@* Match any attribute node
node() Match any type of node
//book/* Select all child elements of book elements

4. Axes

Axes define a set of nodes relative to the current node:

xpath
ancestor Select all ancestor nodes of the current node
ancestor-or-self Select the current node and all its ancestor nodes
attribute Select all attribute nodes of the current node
child Select all child nodes of the current node
descendant Select all descendant nodes of the current node
descendant-or-self Select the current node and all its descendant nodes
following Select all nodes after the current node in the document
following-sibling Select all sibling nodes after the current node
namespace Select all namespace nodes of the current node
parent Select the parent node of the current node
preceding Select all nodes before the current node in the document
preceding-sibling Select all sibling nodes before the current node
self Select the current node itself

5. Operators

xpath
Arithmetic operators: + - * div mod
Comparison operators: = != < > <= >=
Boolean operators: and or not()

XPath Functions

Node Set Functions

xpath
count(//book) Count the number of book elements
id('b1') Select the element with ID 'b1'
local-name() Return the local name of the node
namespace-uri() Return the namespace URI of the node
name() Return the name of the node

String Functions

xpath
string() Convert node to string
concat('Hello', ' ', 'World') Concatenate strings
starts-with(text, 'XML') Check if it starts with the specified string
contains(text, 'XML') Check if it contains the specified string
substring(text, 1, 3) Extract substring
string-length(text) Return string length
normalize-space(text) Normalize whitespace characters
translate(text, 'abc', 'XYZ') Character replacement

Boolean Functions

xpath
boolean() Convert to boolean
not() Logical NOT
true() Return true
false() Return false
lang() Check language settings

Number Functions

xpath
number() Convert to number
sum(//price) Calculate sum
floor(3.7) Round down
ceiling(3.2) Round up
round(3.7) Round to nearest integer

Practical Application Examples of XPath

1. Using XPath in Java

java
import javax.xml.xpath.*;
import org.w3c.dom.*;

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();

// Parse XML document
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(new File("books.xml"));

// Execute XPath query
String expression = "//book[@category='web']/title/text()";
String title = xpath.evaluate(expression, doc, XPathConstants.STRING);
System.out.println("Title: " + title);

// Get node list
NodeList books = (NodeList) xpath.evaluate("//book", doc, XPathConstants.NODESET);
for (int i = 0; i < books.getLength(); i++) {
    Element book = (Element) books.item(i);
    System.out.println(book.getAttribute("category"));
}

2. Using XPath in Python (lxml)

python
from lxml import etree

# Parse XML document
tree = etree.parse("books.xml")

# Execute XPath query
titles = tree.xpath("//book[@category='web']/title/text()")
for title in titles:
    print(f"Title: {title}")

# Get attributes
categories = tree.xpath("//book/@category")
for category in categories:
    print(f"Category: {category}")

# Use functions
total_price = sum(tree.xpath("//book/price/text()"))
print(f"Total price: {total_price}")

3. Using XPath in JavaScript

javascript
// Parse XML
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, "text/xml");

// Execute XPath query
const result = xmlDoc.evaluate(
    "//book[@category='web']/title",
    xmlDoc,
    null,
    XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
    null
);

for (let i = 0; i < result.snapshotLength; i++) {
    const node = result.snapshotItem(i);
    console.log(node.textContent);
}

Relationship Between XPath and XQuery

XQuery is a query language built on XPath that extends XPath's capabilities:

XPath: Used for locating and selecting nodes
XQuery: Used for querying, transforming, and constructing XML data

XPath Best Practices

Use absolute paths: When document structure is fixed, use absolute paths for better performance
Avoid using //: // searches the entire document, affecting performance
Use predicates for filtering: Filter nodes early to reduce the amount of data processed
Utilize indexes: For large documents, consider using indexes to optimize queries
Cache query results: For frequently executed queries, cache results

XPath is a powerful tool for processing XML data. Mastering XPath can greatly improve the efficiency and flexibility of XML data processing.