1. Install Cheerio:
First, ensure Cheerio is installed in your Node.js project. If not installed, you can install it via npm:
bashnpm install cheerio
2. Load HTML Content:
You can use Node.js's fs module to read local HTML files or an HTTP client library like axios to fetch web page content. Here, I'll demonstrate an example using axios to retrieve online HTML:
javascriptconst axios = require('axios'); const cheerio = require('cheerio'); async function fetchHTML(url) { const { data } = await axios.get(url); return data; }
3. Use Cheerio to Extract <script> Tag Content:
After obtaining the HTML, load it with Cheerio and extract all <script> tags:
javascriptasync function extractScripts(url) { const html = await fetchHTML(url); const $ = cheerio.load(html); $('script').each((i, elem) => { console.log($(elem).html()); }); }
In this function, $('script') selects all <script> tags, the each method iterates through them, and $(elem).html() retrieves the JavaScript code within each tag.
4. Call the Function:
Finally, invoke the extractScripts function with a URL:
javascriptextractScripts('https://example.com');
Example Explanation:
Suppose we extract scripts from a simple HTML page with the following content:
html<!DOCTYPE html> <html> <head> <title>Example Page</title> </head> <body> <h1>Welcome to Example Page</h1> <script> console.log('This is inline JavaScript code'); </script> <script src="example.js"></script> </body> </html>
Here, the extractScripts function outputs console.log('This is inline JavaScript code'); and an empty string, as the second <script> tag references an external file without inline code.
In this manner, Cheerio enables developers to efficiently extract and process <script> tag content from web pages, making it particularly valuable for web scraping applications.