Performance optimization in Puppeteer is crucial for improving scraping efficiency, reducing resource consumption, and increasing testing speed. Here are some key optimization strategies and best practices.
1. Browser Launch Optimization
Use Appropriate Launch Arguments:
javascriptconst browser = await puppeteer.launch({ headless: 'new', // Use new headless mode (faster) args: [ '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', // Avoid memory issues '--disable-accelerated-2d-canvas', '--disable-gpu', '--window-size=1920,1080' ] });
Reuse Browser Instance:
javascript// Bad practice: Launch new browser for each task async function badApproach(urls) { for (const url of urls) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); await browser.close(); } } // Good practice: Reuse browser instance async function goodApproach(urls) { const browser = await puppeteer.launch(); for (const url of urls) { const page = await browser.newPage(); await page.goto(url); await page.close(); } await browser.close(); }
2. Page Loading Optimization
Optimize waitUntil Options:
javascript// Choose appropriate wait strategy based on needs await page.goto(url, { waitUntil: 'domcontentloaded' // Fastest, DOM loaded }); await page.goto(url, { waitUntil: 'load' // Default, all resources loaded }); await page.goto(url, { waitUntil: 'networkidle0' // No network requests for 500ms }); await page.goto(url, { waitUntil: 'networkidle2' // No more than 2 network requests for 500ms });
Disable Unnecessary Resources:
javascriptawait page.setRequestInterception(true); page.on('request', (request) => { const resourceType = request.resourceType(); // Block images, fonts, media, etc. if (['image', 'font', 'media', 'stylesheet'].includes(resourceType)) { request.abort(); } else { request.continue(); } });
Cache Strategy:
javascript// Enable cache await page.setCacheEnabled(true); // Disable cache (reload every time) await page.setCacheEnabled(false);
3. Concurrent Processing
Use Promise.all for Parallel Processing:
javascriptconst urls = ['url1', 'url2', 'url3']; const browser = await puppeteer.launch(); // Process multiple pages in parallel await Promise.all(urls.map(async (url) => { const page = await browser.newPage(); await page.goto(url); await page.screenshot({ path: `${url}.png` }); await page.close(); })); await browser.close();
Control Concurrency Level:
javascriptasync function processWithConcurrency(urls, concurrency = 3) { const browser = await puppeteer.launch(); const results = []; for (let i = 0; i < urls.length; i += concurrency) { const batch = urls.slice(i, i + concurrency); const batchResults = await Promise.all( batch.map(async (url) => { const page = await browser.newPage(); await page.goto(url); const data = await page.evaluate(() => document.body.innerText); await page.close(); return data; }) ); results.push(...batchResults); } await browser.close(); return results; }
4. Memory Management
Close Pages Promptly:
javascript// Bad practice: Don't close pages async function badMemoryUsage(urls) { const browser = await puppeteer.launch(); for (const url of urls) { const page = await browser.newPage(); await page.goto(url); // Memory keeps growing without closing pages } await browser.close(); } // Good practice: Close pages promptly async function goodMemoryUsage(urls) { const browser = await puppeteer.launch(); for (const url of urls) { const page = await browser.newPage(); await page.goto(url); await page.close(); // Close page promptly } await browser.close(); }
Use Context Isolation:
javascriptconst context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(); // Operate on page await context.close(); // Close context, clean up all resources
Clean Cookies and Storage:
javascript// Clear cookies await page.deleteCookie(...await page.cookies()); // Clear all storage await page.evaluate(() => { localStorage.clear(); sessionStorage.clear(); });
5. Selector Optimization
Use Efficient Selectors:
javascript// Bad practice: Use generic selectors const elements = await page.$$('div'); // Slow // Good practice: Use specific selectors const elements = await page.$$('.item'); // Fast // Better practice: Use ID selectors const element = await page.$('#unique-id'); // Fastest
Avoid Repeated Queries:
javascript// Bad practice: Repeated queries const text1 = await page.$eval('.title', el => el.textContent); const text2 = await page.$eval('.title', el => el.textContent); // Good practice: Cache element const element = await page.$('.title'); const text1 = await element.evaluate(el => el.textContent); const text2 = await element.evaluate(el => el.textContent);
6. Network Optimization
Use CDN Acceleration:
javascript// Use local Chromium if available const browser = await puppeteer.launch({ executablePath: '/path/to/local/chrome' });
Set Timeout Values:
javascript// Set reasonable timeout values await page.goto(url, { timeout: 30000 }); await page.waitForSelector('.element', { timeout: 5000 });
Use Connection Pool:
javascript// Reuse browser instance as connection pool class BrowserPool { constructor(size = 3) { this.size = size; this.browsers = []; this.queue = []; } async init() { for (let i = 0; i < this.size; i++) { this.browsers.push(await puppeteer.launch()); } } async getBrowser() { if (this.browsers.length > 0) { return this.browsers.pop(); } return new Promise(resolve => this.queue.push(resolve)); } releaseBrowser(browser) { if (this.queue.length > 0) { this.queue.shift()(browser); } else { this.browsers.push(browser); } } }
7. Actual Optimization Cases
Case 1: Batch Screenshot Optimization
javascriptasync function optimizedBatchScreenshots(urls) { const browser = await puppeteer.launch({ headless: 'new', args: ['--no-sandbox', '--disable-setuid-sandbox'] }); // Disable unnecessary resources await page.setRequestInterception(true); page.on('request', (request) => { if (['image', 'font', 'media'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } }); // Parallel processing await Promise.all(urls.map(async (url, index) => { const page = await browser.newPage(); await page.goto(url, { waitUntil: 'domcontentloaded' }); await page.screenshot({ path: `screenshot-${index}.png` }); await page.close(); })); await browser.close(); }
Case 2: Data Scraping Optimization
javascriptasync function optimizedScraping(urls) { const browser = await puppeteer.launch({ headless: 'new', args: [ '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage' ] }); const results = []; for (const url of urls) { const page = await browser.newPage(); // Disable image loading await page.setRequestInterception(true); page.on('request', (request) => { if (request.resourceType() === 'image') { request.abort(); } else { request.continue(); } }); // Fast loading await page.goto(url, { waitUntil: 'domcontentloaded' }); // Batch data retrieval const data = await page.evaluate(() => { return Array.from(document.querySelectorAll('.item')).map(item => ({ title: item.querySelector('.title')?.textContent, price: item.querySelector('.price')?.textContent })); }); results.push(...data); await page.close(); } await browser.close(); return results; }
Case 3: Monitoring and Performance Analysis
javascriptasync function monitorPerformance(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); // Enable performance monitoring const client = await page.target().createCDPSession(); await client.send('Performance.enable'); const startTime = Date.now(); await page.goto(url, { waitUntil: 'networkidle2' }); const loadTime = Date.now() - startTime; // Get performance metrics const metrics = await client.send('Performance.getMetrics'); console.log('Load time:', loadTime); console.log('Metrics:', metrics); await browser.close(); }
8. Performance Monitoring Tools
Use Chrome DevTools Protocol:
javascriptconst client = await page.target().createCDPSession(); // Enable performance monitoring await client.send('Performance.enable'); // Get performance metrics const metrics = await client.send('Performance.getMetrics'); // Enable network monitoring await client.send('Network.enable'); // Listen to network events client.on('Network.requestWillBeSent', (params) => { console.log('Request:', params.request.url); });
Use Puppeteer's Performance Tracing:
javascript// Start tracing await page.tracing.start({ path: 'trace.json' }); // Execute operations await page.goto('https://example.com'); // Stop tracing await page.tracing.stop();
9. Best Practices Summary
1. Launch Optimization:
- Use
headless: 'new'mode - Add appropriate launch arguments
- Reuse browser instances
2. Loading Optimization:
- Choose appropriate
waitUntilstrategy - Disable unnecessary resources
- Use caching
3. Concurrency Optimization:
- Use
Promise.allfor parallel processing - Control concurrency level
- Use connection pools
4. Memory Optimization:
- Close pages and browsers promptly
- Use context isolation
- Clean cookies and storage
5. Selector Optimization:
- Use efficient selectors
- Avoid repeated queries
- Cache element references
6. Network Optimization:
- Set reasonable timeout values
- Use local Chromium
- Optimize network requests
10. Common Performance Issues and Solutions
Issue 1: Memory Leaks
javascript// Solution: Clean up resources promptly async function fixMemoryLeak() { const browser = await puppeteer.launch(); try { // Operation code } finally { await browser.close(); } }
Issue 2: Slow Page Loading
javascript// Solution: Optimize loading strategy await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 10000 });
Issue 3: High Concurrency Causing Crashes
javascript// Solution: Limit concurrency const CONCURRENCY = 3; // Use connection pool or batch processing
Issue 4: High CPU Usage
javascript// Solution: Disable unnecessary features const browser = await puppeteer.launch({ args: [ '--disable-gpu', '--disable-dev-shm-usage' ] });