Cheerio provides multiple methods for loading HTML content, suitable for different use cases:
1. Loading from HTML String
The most common method, directly passing an HTML string:
javascriptconst cheerio = require('cheerio'); const $ = cheerio.load('<div class="content"><p>Hello</p></div>');
2. Loading from File
Read HTML file and then load:
javascriptconst fs = require('fs'); const cheerio = require('cheerio'); const html = fs.readFileSync('index.html', 'utf8'); const $ = cheerio.load(html);
3. Loading with Configuration Options
The cheerio.load() method accepts a second parameter as configuration options:
javascriptconst $ = cheerio.load(html, { // Whether to recognize XML mode xmlMode: false, // Whether to decode HTML entities decodeEntities: true, // Whether to include whitespace nodes withDomLvl1: false, // Default function to handle XML tags normalizeWhitespace: false, // Use htmlparser2 options xml: { xmlMode: false, decodeEntities: true } });
4. XML Mode Loading
Enable XML mode when processing XML documents:
javascriptconst xml = '<root><item>Value</item></root>'; const $ = cheerio.load(xml, { xmlMode: true });
5. Stream Processing (Combined with other libraries)
For large files, use stream reading:
javascriptconst fs = require('fs'); const cheerio = require('cheerio'); const stream = fs.createReadStream('large.html'); let html = ''; stream.on('data', chunk => { html += chunk; }); stream.on('end', () => { const $ = cheerio.load(html); // Process DOM });
Best Practices
- For small to medium-sized HTML, use
cheerio.load()directly - For large files, consider chunked processing or use specialized streaming parsers
- For XML documents, always set
xmlMode: true - Adjust
decodeEntitiesoption based on requirements