Puppeteer 如何实现页面截图和 PDF 生成?有哪些高级选项和实际应用场景?
Puppeteer 提供了强大的页面截图和 PDF 生成功能,可以用于自动化测试、文档生成、网页归档等多种场景。1. 页面截图(Screenshots)基本截图:const puppeteer = require('puppeteer');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); // 基本截图 await page.screenshot({ path: 'example.png' }); await browser.close();})();截图选项详解:await page.screenshot({ path: 'screenshot.png', // 保存路径 type: 'png', // 格式:'png' 或 'jpeg' quality: 90, // JPEG 质量(0-100),仅适用于 JPEG fullPage: true, // 截取整个页面(包括滚动部分) clip: { // 裁剪区域 x: 0, y: 0, width: 800, height: 600 }, omitBackground: false, // 是否省略白色背景(透明 PNG) encoding: 'base64', // 编码方式:'base64' 或 'binary' captureBeyondViewport: false // 是否捕获视口之外的内容});截取特定元素:const element = await page.$('#header');await element.screenshot({ path: 'header.png' });截取视口区域:await page.setViewport({ width: 1920, height: 1080 });await page.screenshot({ path: 'viewport.png' });全页截图(包括滚动内容):await page.screenshot({ path: 'fullpage.png', fullPage: true});高质量 JPEG 截图:await page.screenshot({ path: 'high-quality.jpg', type: 'jpeg', quality: 95});透明背景截图:await page.screenshot({ path: 'transparent.png', omitBackground: true});获取截图为 Base64:const base64 = await page.screenshot({ encoding: 'base64'});console.log(base64);2. PDF 生成基本 PDF 生成:await page.pdf({ path: 'page.pdf' });PDF 选项详解:await page.pdf({ path: 'output.pdf', // 保存路径 scale: 1, // 缩放比例 displayHeaderFooter: false, // 是否显示页眉页脚 headerTemplate: '', // 页眉 HTML 模板 footerTemplate: '', // 页脚 HTML 模板 printBackground: false, // 是否打印背景图形 landscape: false, // 是否横向打印 pageRanges: '', // 打印页码范围,如 '1-5, 8, 11-13' format: 'A4', // 纸张格式 width: '', // 纸张宽度,如 '10in' height: '', // 纸张高度,如 '20in' margin: { // 页边距 top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }, preferCSSPageSize: false // 是否使用 CSS 页面大小});支持的纸张格式:Letter: 8.5in x 11inLegal: 8.5in x 14inTabloid: 11in x 17inLedger: 17in x 11inA0: 33.1in x 46.8inA1: 23.4in x 33.1inA2: 16.5in x 23.4inA3: 11.7in x 16.5inA4: 8.27in x 11.7inA5: 5.83in x 8.27inA6: 4.13in x 5.83in横向 PDF:await page.pdf({ path: 'landscape.pdf', landscape: true, format: 'A4'});自定义纸张大小:await page.pdf({ path: 'custom.pdf', width: '200mm', height: '300mm'});设置页边距:await page.pdf({ path: 'margin.pdf', margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' }});打印背景图形:await page.pdf({ path: 'background.pdf', printBackground: true});添加页眉页脚:await page.pdf({ path: 'header-footer.pdf', displayHeaderFooter: true, headerTemplate: ` <div style="font-size: 10px; text-align: center; width: 100%;"> Generated by Puppeteer </div> `, footerTemplate: ` <div style="font-size: 10px; text-align: center; width: 100%;"> Page <span class="pageNumber"></span> of <span class="totalPages"></span> </div> `});打印特定页面:await page.pdf({ path: 'pages.pdf', pageRanges: '1-3, 5, 8-10'});3. 实际应用场景场景 1:网页归档async function archiveWebpage(url, outputPath) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // 生成 PDF 归档 await page.pdf({ path: outputPath, format: 'A4', printBackground: true, margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' } }); await browser.close();}archiveWebpage('https://example.com', 'archive.pdf');场景 2:批量截图服务async function batchScreenshots(urls) { const browser = await puppeteer.launch(); const page = await browser.newPage(); for (const url of urls) { await page.goto(url, { waitUntil: 'networkidle2' }); const filename = url .replace(/https?:\/\//, '') .replace(/\//g, '_') + '.png'; await page.screenshot({ path: `screenshots/${filename}`, fullPage: true }); console.log(`Screenshot saved: ${filename}`); } await browser.close();}batchScreenshots([ 'https://example.com', 'https://example.com/about', 'https://example.com/contact']);场景 3:生成发票 PDFasync function generateInvoice(invoiceData) { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 加载发票模板 await page.setContent(` <html> <head> <style> body { font-family: Arial, sans-serif; padding: 40px; } .header { text-align: center; margin-bottom: 40px; } .invoice-info { margin-bottom: 30px; } table { width: 100%; border-collapse: collapse; } th, td { border: 1px solid #ddd; padding: 10px; text-align: left; } th { background-color: #f2f2f2; } .total { text-align: right; font-weight: bold; margin-top: 20px; } </style> </head> <body> <div class="header"> <h1>INVOICE</h1> <p>Invoice #: ${invoiceData.number}</p> </div> <div class="invoice-info"> <p>Date: ${invoiceData.date}</p> <p>Customer: ${invoiceData.customer}</p> </div> <table> <thead> <tr> <th>Item</th> <th>Quantity</th> <th>Price</th> <th>Total</th> </tr> </thead> <tbody> ${invoiceData.items.map(item => ` <tr> <td>${item.name}</td> <td>${item.quantity}</td> <td>$${item.price}</td> <td>$${item.quantity * item.price}</td> </tr> `).join('')} </tbody> </table> <div class="total"> Total: $${invoiceData.total} </div> </body> </html> `); // 生成 PDF await page.pdf({ path: `invoice_${invoiceData.number}.pdf`, format: 'A4', printBackground: true, margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' } }); await browser.close();}generateInvoice({ number: 'INV-001', date: '2024-01-15', customer: 'John Doe', items: [ { name: 'Product A', quantity: 2, price: 50 }, { name: 'Product B', quantity: 1, price: 75 } ], total: 175});场景 4:响应式设计测试截图async function responsiveScreenshots(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const viewports = [ { name: 'mobile', width: 375, height: 667 }, { name: 'tablet', width: 768, height: 1024 }, { name: 'desktop', width: 1920, height: 1080 } ]; for (const viewport of viewports) { await page.setViewport(viewport); await page.goto(url, { waitUntil: 'networkidle2' }); await page.screenshot({ path: `${viewport.name}.png`, fullPage: true }); console.log(`Screenshot saved: ${viewport.name}.png`); } await browser.close();}responsiveScreenshots('https://example.com');4. 性能优化建议1. 并行处理:const urls = ['url1', 'url2', 'url3'];const browser = await puppeteer.launch();await Promise.all(urls.map(async (url, index) => { const page = await browser.newPage(); await page.goto(url); await page.screenshot({ path: `screenshot-${index}.png` }); await page.close();}));await browser.close();2. 复用浏览器实例:const browser = await puppeteer.launch();// 多次使用同一个浏览器实例for (const url of urls) { const page = await browser.newPage(); await page.goto(url); await page.screenshot({ path: `${url}.png` }); await page.close();}await browser.close();3. 禁用不必要的资源:await page.setRequestInterception(true);page.on('request', (request) => { if (['image', 'font', 'media'].includes(request.resourceType())) { request.abort(); } else { request.continue(); }});5. 注意事项PDF 生成限制:PDF 生成功能仅在无头模式下可用字体支持:确保系统安装了所需的字体页面加载:使用 waitUntil: 'networkidle2' 确保页面完全加载内存管理:处理大量页面时注意内存使用错误处理:添加适当的错误处理逻辑超时设置:根据页面复杂度调整超时时间