乐闻世界logo
搜索文章和话题

How does Cheerio load HTML content? What are the different loading methods?

2月22日 14:30

Cheerio provides multiple methods for loading HTML content, suitable for different use cases:

1. Loading from HTML String

The most common method, directly passing an HTML string:

javascript
const cheerio = require('cheerio'); const $ = cheerio.load('<div class="content"><p>Hello</p></div>');

2. Loading from File

Read HTML file and then load:

javascript
const fs = require('fs'); const cheerio = require('cheerio'); const html = fs.readFileSync('index.html', 'utf8'); const $ = cheerio.load(html);

3. Loading with Configuration Options

The cheerio.load() method accepts a second parameter as configuration options:

javascript
const $ = cheerio.load(html, { // Whether to recognize XML mode xmlMode: false, // Whether to decode HTML entities decodeEntities: true, // Whether to include whitespace nodes withDomLvl1: false, // Default function to handle XML tags normalizeWhitespace: false, // Use htmlparser2 options xml: { xmlMode: false, decodeEntities: true } });

4. XML Mode Loading

Enable XML mode when processing XML documents:

javascript
const xml = '<root><item>Value</item></root>'; const $ = cheerio.load(xml, { xmlMode: true });

5. Stream Processing (Combined with other libraries)

For large files, use stream reading:

javascript
const fs = require('fs'); const cheerio = require('cheerio'); const stream = fs.createReadStream('large.html'); let html = ''; stream.on('data', chunk => { html += chunk; }); stream.on('end', () => { const $ = cheerio.load(html); // Process DOM });

Best Practices

  • For small to medium-sized HTML, use cheerio.load() directly
  • For large files, consider chunked processing or use specialized streaming parsers
  • For XML documents, always set xmlMode: true
  • Adjust decodeEntities option based on requirements
标签:NodeJSCheerio