乐闻世界logo
搜索文章和话题

What are the differences between Cheerio and jsdom? How to choose which one to use?

2月22日 14:31

Cheerio and jsdom are both tools for handling HTML/XML in Node.js, but they have significant differences in design philosophy and implementation. Here's a detailed comparison:

1. Core Architecture Comparison

Cheerio

  • Type: HTML Parser
  • Underlying Implementation: Based on htmlparser2
  • DOM Implementation: Custom lightweight DOM implementation
  • JavaScript Execution: Not supported
  • Browser Environment Simulation: Not simulated

jsdom

  • Type: Complete DOM and browser environment simulator
  • Underlying Implementation: Based on WHATWG DOM standard
  • DOM Implementation: Complete W3C DOM specification implementation
  • JavaScript Execution: Fully supported
  • Browser Environment Simulation: Complete simulation

2. Feature Comparison Table

FeatureCheeriojsdom
HTML Parsing✅ Fast✅ Standard
CSS Selectors✅ jQuery style✅ Standard
DOM Manipulation✅ Basic operations✅ Complete API
JavaScript Execution❌ Not supported✅ Fully supported
Event Handling❌ Not supported✅ Fully supported
Performance⚡ Extremely fast🐢 Slower
Memory Usage📉 Low📈 High
Browser APIs❌ None✅ Complete
Network Requests❌ None✅ Supported
Canvas❌ None✅ Supported
LocalStorage❌ None✅ Supported

3. Usage Example Comparison

Cheerio Usage Example

javascript
const cheerio = require('cheerio'); const html = ` <div id="container"> <p class="text">Hello World</p> <button onclick="alert('Clicked')">Click</button> </div> `; const $ = cheerio.load(html); // Basic operations console.log($('#container').text()); // "Hello World" console.log($('.text').text()); // "Hello World" // DOM manipulation $('.text').addClass('highlight'); console.log($('.text').attr('class')); // "text highlight" // Cannot execute JavaScript $('button').click(); // Invalid, Cheerio doesn't support events

jsdom Usage Example

javascript
const { JSDOM } = require('jsdom'); const html = ` <div id="container"> <p class="text">Hello World</p> <button onclick="alert('Clicked')">Click</button> </div> `; const dom = new JSDOM(html); const document = dom.window.document; // Basic operations console.log(document.getElementById('container').textContent); // "Hello World" console.log(document.querySelector('.text').textContent); // "Hello World" // DOM manipulation document.querySelector('.text').classList.add('highlight'); console.log(document.querySelector('.text').className); // "text highlight" // Can execute JavaScript const button = document.querySelector('button'); button.click(); // Valid, will trigger onclick event // Can use browser APIs console.log(dom.window.innerWidth); // Window width console.log(dom.window.location.href); // Current URL

4. Performance Comparison

Parsing Speed Test

javascript
const cheerio = require('cheerio'); const { JSDOM } = require('jsdom'); const largeHtml = '<div>' + '<p>Test</p>'.repeat(10000) + '</div>'; // Cheerio performance test const start1 = Date.now(); const $ = cheerio.load(largeHtml); const cheerioTime = Date.now() - start1; console.log(`Cheerio: ${cheerioTime}ms`); // jsdom performance test const start2 = Date.now(); const dom = new JSDOM(largeHtml); const jsdomTime = Date.now() - start2; console.log(`jsdom: ${jsdomTime}ms`); // Typical results: // Cheerio: 5-10ms // jsdom: 100-500ms

Memory Usage Comparison

javascript
// Cheerio - Low memory usage function cheerioMemoryTest() { const $ = cheerio.load(largeHtml); const elements = $('p'); return elements.length; } // jsdom - High memory usage function jsdomMemoryTest() { const dom = new JSDOM(largeHtml); const elements = dom.window.document.querySelectorAll('p'); return elements.length; }

5. Use Case Comparison

Scenarios for Using Cheerio

javascript
// 1. Web scraping and data extraction async function scrapeWebsite() { const axios = require('axios'); const response = await axios.get('https://example.com'); const $ = cheerio.load(response.data); return { title: $('title').text(), links: $('a').map((i, el) => $(el).attr('href')).get() }; } // 2. HTML content processing function processHtml(html) { const $ = cheerio.load(html); $('script').remove(); // Remove scripts $('style').remove(); // Remove styles return $.html(); } // 3. Batch processing large amounts of documents function batchProcess(htmlList) { return htmlList.map(html => { const $ = cheerio.load(html); return $('title').text(); }); }

Scenarios for Using jsdom

javascript
// 1. Testing frontend code const { JSDOM } = require('jsdom'); function testFrontendCode() { const dom = new JSDOM(` <div id="app"></div> <script> document.getElementById('app').textContent = 'Hello'; </script> `, { runScripts: 'dangerously' }); console.log(dom.window.document.getElementById('app').textContent); } // 2. Server-side rendering (SSR) function renderComponent(component) { const dom = new JSDOM('<div id="root"></div>'); const root = dom.window.document.getElementById('root'); // Execute component code component(root); return dom.serialize(); } // 3. Processing content requiring JavaScript function processDynamicContent(html) { const dom = new JSDOM(html, { runScripts: 'dangerously', resources: 'usable' }); // Wait for JavaScript execution to complete return new Promise(resolve => { dom.window.onload = () => { resolve(dom.serialize()); }; }); }

6. API Comparison

Cheerio API Characteristics

javascript
const $ = cheerio.load(html); // jQuery-style API $('.class').text(); $('.class').html(); $('.class').attr('href'); $('.class').addClass('active'); $('.class').find('a'); // Chaining $('.container') .find('.item') .addClass('highlight') .text(); // Unsupported browser APIs $.window; // undefined $.document; // undefined $.localStorage; // undefined

jsdom API Characteristics

javascript
const dom = new JSDOM(html); const document = dom.window.document; // Standard DOM API document.querySelector('.class').textContent; document.querySelector('.class').innerHTML; document.querySelector('.class').getAttribute('href'); document.querySelector('.class').classList.add('active'); document.querySelector('.class').querySelector('a'); // Supported browser APIs dom.window.innerWidth; dom.window.location.href; dom.window.localStorage; dom.window.fetch; dom.window.console;

7. Selection Recommendations

Situations to Choose Cheerio

  1. Only need to parse and extract data

    • Web scraping
    • Data extraction
    • HTML content processing
  2. High performance requirements

    • Processing large amounts of documents
    • Batch operations
    • Real-time processing
  3. Limited resources

    • Limited memory
    • Limited CPU
    • No server environment
  4. Don't need browser features

    • Don't need to execute JavaScript
    • Don't need event handling
    • Don't need browser APIs

Situations to Choose jsdom

  1. Need complete browser environment

    • Frontend code testing
    • Server-side rendering
    • Component testing
  2. Need to execute JavaScript

    • Dynamic content processing
    • Client-side code execution
    • Framework rendering
  3. Need browser APIs

    • LocalStorage
    • Fetch API
    • Canvas
    • Web Workers
  4. Need standard DOM behavior

    • Event bubbling
    • DOM events
    • Browser compatibility testing

8. Hybrid Usage Scenarios

javascript
// First use jsdom to execute JavaScript, then use Cheerio to parse const { JSDOM } = require('jsdom'); const cheerio = require('cheerio'); async function hybridProcess(html) { // 1. Use jsdom to execute JavaScript const dom = new JSDOM(html, { runScripts: 'dangerously' }); // Wait for JavaScript execution await new Promise(resolve => { dom.window.onload = resolve; }); // 2. Get HTML after execution const processedHtml = dom.serialize(); // 3. Use Cheerio for fast parsing const $ = cheerio.load(processedHtml); return { title: $('title').text(), content: $('.content').text() }; }

Summary

  • Cheerio: Lightweight, fast, focused on data extraction, suitable for scraping and static HTML processing
  • jsdom: Complete, standard, simulates browser, suitable for testing and dynamic content processing
  • Selection principle: Choose based on needs, use Cheerio for performance, use jsdom for complete functionality
  • Hybrid usage: Can combine advantages of both, use jsdom to execute JS first, then use Cheerio to parse
标签:NodeJSCheerio