乐闻世界logo
搜索文章和话题

Puppeteer

Puppeteer 是一个 Node.js 库,它提供了一个高级 API 来通过 DevTools 协议控制无头 Chrome 或 Chromium。它还可以配置为使用完整(非无头)Chrome 或 Chromium。
Puppeteer
查看更多相关内容
Cheerio 和 Puppeteer 有什么区别?如何选择使用?Cheerio 和 Puppeteer 都是 Node.js 中用于处理网页的工具,但它们的设计目标和使用场景有显著差异: ## 1. 核心区别 | 特性 | Cheerio | Puppeteer | |------|---------|-----------| | **类型** | HTML 解析器 | 浏览器自动化工具 | | **JavaScript 执行** | 不支持 | 完全支持 | | **动态内容** | 无法处理 | 完全支持 | | **性能** | 极快 | 较慢 | | **资源消耗** | 低 | 高 | | **API** | jQuery 风格 | 浏览器 DevTools 协议 | | **使用场景** | 静态 HTML 解析 | 动态网页、截图、PDF | ## 2. Cheerio 的特点 ### 优势 - **轻量快速**:核心代码只有几百行,解析速度极快 - **简单易用**:jQuery 风格的 API,学习成本低 - **低资源消耗**:不需要启动浏览器,内存占用少 - **适合批量处理**:可以快速处理大量静态页面 ### 局限性 - **无法执行 JavaScript**:只能解析静态 HTML - **无法处理动态内容**:无法获取通过 JS 动态加载的数据 - **无法处理复杂交互**:不支持点击、滚动等用户操作 - **无法截图或生成 PDF**:没有可视化能力 ### 适用场景 ```javascript // 适合:静态网页数据提取 const cheerio = require('cheerio'); const axios = require('axios'); async function scrapeStaticSite() { const response = await axios.get('https://example.com'); const $ = cheerio.load(response.data); return { title: $('title').text(), links: $('a').map((i, el) => $(el).attr('href')).get() }; } ``` ## 3. Puppeteer 的特点 ### 优势 - **完整浏览器环境**:使用真实的 Chrome/Chromium - **JavaScript 执行**:可以执行页面中的所有 JavaScript - **动态内容支持**:可以获取 AJAX 加载的数据 - **交互能力**:支持点击、输入、滚动等操作 - **可视化功能**:支持截图、生成 PDF - **网络拦截**:可以监控和修改网络请求 ### 局限性 - **资源消耗大**:需要启动完整的浏览器实例 - **速度较慢**:相比 Cheerio 慢很多 - **复杂度高**:API 相对复杂,学习成本高 - **部署困难**:在某些服务器环境部署较复杂 ### 适用场景 ```javascript // 适合:动态网页、需要交互的场景 const puppeteer = require('puppeteer'); async function scrapeDynamicSite() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com', { waitUntil: 'networkidle2' }); // 等待动态内容加载 await page.waitForSelector('.dynamic-content'); const data = await page.evaluate(() => { return { title: document.title, content: document.querySelector('.dynamic-content').textContent }; }); await browser.close(); return data; } ``` ## 4. 性能对比 ```javascript // Cheerio - 快速解析 const cheerio = require('cheerio'); async function cheerioBenchmark() { const start = Date.now(); const $ = cheerio.load(htmlString); const items = $('.item').map((i, el) => $(el).text()).get(); const time = Date.now() - start; console.log(`Cheerio: ${time}ms, ${items.length} items`); // 结果:通常 < 10ms } // Puppeteer - 完整浏览器 const puppeteer = require('puppeteer'); async function puppeteerBenchmark() { const start = Date.now(); const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setContent(htmlString); const items = await page.$$eval('.item', elements => elements.map(el => el.textContent) ); await browser.close(); const time = Date.now() - start; console.log(`Puppeteer: ${time}ms, ${items.length} items`); // 结果:通常 500-2000ms } ``` ## 5. 选择建议 ### 使用 Cheerio 的场景 - 网站内容是静态 HTML - 需要处理大量页面 - 对性能要求高 - 只需要提取数据,不需要交互 - 服务器资源有限 ### 使用 Puppeteer 的场景 - 网站使用 JavaScript 动态加载内容 - 需要模拟用户操作(点击、滚动等) - 需要截图或生成 PDF - 需要处理复杂的 SPA 应用 - 需要监控网络请求 ### 混合使用场景 ```javascript // 先用 Puppeteer 获取动态内容,再用 Cheerio 解析 const puppeteer = require('puppeteer'); const cheerio = require('cheerio'); async function hybridScrape() { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 使用 Puppeteer 加载动态页面 await page.goto('https://example.com/dynamic'); await page.waitForSelector('.content'); // 获取 HTML const html = await page.content(); await browser.close(); // 使用 Cheerio 快速解析 const $ = cheerio.load(html); const data = $('.item').map((i, el) => ({ title: $(el).find('.title').text(), content: $(el).find('.content').text() })).get(); return data; } ``` ## 6. 实际应用示例 ### Cheerio - 抓取静态博客 ```javascript async function scrapeBlog() { const response = await axios.get('https://blog.example.com'); const $ = cheerio.load(response.data); return $('.post').map((i, el) => ({ title: $(el).find('h2').text(), date: $(el).find('.date').text(), excerpt: $(el).find('.excerpt').text() })).get(); } ``` ### Puppeteer - 抓取动态电商网站 ```javascript async function scrapeShop() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://shop.example.com'); // 滚动加载更多商品 for (let i = 0; i < 5; i++) { await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight)); await page.waitForTimeout(1000); } const products = await page.$$eval('.product', items => items.map(item => ({ name: item.querySelector('.name').textContent, price: item.querySelector('.price').textContent })) ); await browser.close(); return products; } ``` ## 总结 - **Cheerio**:适合静态页面、高性能需求、批量处理 - **Puppeteer**:适合动态页面、需要交互、可视化需求 - **混合使用**:先用 Puppeteer 加载动态内容,再用 Cheerio 解析,可以获得最佳的性能和功能平衡
服务端 · 2月22日 14:30
Puppeteer 如何与测试框架集成?有哪些 E2E 测试和 CI/CD 集成的最佳实践?Puppeteer 可以与各种测试框架集成,实现端到端测试、单元测试和集成测试。以下是常见的集成方式和最佳实践。 **1. 与 Jest 集成** **安装依赖:** ```bash npm install --save-dev puppeteer jest jest-puppeteer @types/puppeteer ``` **配置 Jest:** ```javascript // jest.config.js module.exports = { preset: 'jest-puppeteer', testMatch: ['**/*.test.js'], setupFilesAfterEnv: ['./jest.setup.js'] }; ``` **设置文件:** ```javascript // jest.setup.js beforeEach(async () => { await page.goto('http://localhost:3000'); }); ``` **编写测试:** ```javascript describe('Puppeteer with Jest', () => { test('page title', async () => { await expect(page.title()).resolves.toMatch('My App'); }); test('user can login', async () => { await page.type('#username', 'testuser'); await page.type('#password', 'password'); await page.click('#login-button'); await expect(page).toMatch('Welcome'); }); }); ``` **2. 与 Mocha 集成** **安装依赖:** ```bash npm install --save-dev puppeteer mocha chai ``` **配置 Mocha:** ```javascript // mocha.config.js module.exports = { timeout: 10000, require: ['mocha-setup.js'] }; ``` **设置文件:** ```javascript // mocha-setup.js const puppeteer = require('puppeteer'); const { expect } = require('chai'); let browser; let page; before(async () => { browser = await puppeteer.launch(); page = await browser.newPage(); }); after(async () => { await browser.close(); }); beforeEach(async () => { await page.goto('http://localhost:3000'); }); global.page = page; global.expect = expect; ``` **编写测试:** ```javascript describe('Puppeteer with Mocha', () => { it('should display correct title', async () => { const title = await page.title(); expect(title).to.equal('My App'); }); it('should allow user to login', async () => { await page.type('#username', 'testuser'); await page.type('#password', 'password'); await page.click('#login-button'); const welcomeText = await page.$eval('.welcome', el => el.textContent); expect(welcomeText).to.include('Welcome'); }); }); ``` **3. 与 Playwright 集成** **安装依赖:** ```bash npm install --save-dev @playwright/test ``` **配置 Playwright:** ```javascript // playwright.config.js module.exports = { testDir: './tests', use: { headless: true, screenshot: 'only-on-failure' } }; ``` **编写测试:** ```javascript const { test, expect } = require('@playwright/test'); test.describe('Puppeteer migration', () => { test('basic navigation', async ({ page }) => { await page.goto('http://localhost:3000'); await expect(page).toHaveTitle('My App'); }); test('form submission', async ({ page }) => { await page.goto('http://localhost:3000'); await page.fill('#username', 'testuser'); await page.fill('#password', 'password'); await page.click('#login-button'); await expect(page.locator('.welcome')).toBeVisible(); }); }); ``` **4. 与 Cypress 集成** **安装 Cypress:** ```bash npm install --save-dev cypress ``` **配置 Cypress:** ```javascript // cypress.config.js const { defineConfig } = require('cypress'); module.exports = defineConfig({ e2e: { baseUrl: 'http://localhost:3000', setupNodeEvents(on, config) { on('task', { puppeteer({ url, action }) { return require('./puppeteer-task')(url, action); } }); } } }); ``` **Puppeteer 任务:** ```javascript // puppeteer-task.js const puppeteer = require('puppeteer'); module.exports = async (url, action) => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); let result; if (action === 'screenshot') { result = await page.screenshot({ encoding: 'base64' }); } else if (action === 'pdf') { result = await page.pdf({ encoding: 'base64' }); } await browser.close(); return result; }; ``` **Cypress 测试:** ```javascript // cypress/e2e/puppeteer.spec.js describe('Puppeteer integration', () => { it('should take screenshot with Puppeteer', () => { cy.task('puppeteer', { url: 'http://localhost:3000', action: 'screenshot' }).then(screenshot => { cy.log('Screenshot taken'); }); }); }); ``` **5. 端到端测试最佳实践** **测试结构:** ```javascript describe('User Flow', () => { before(async () => { // 设置测试环境 await setupTestDatabase(); }); after(async () => { // 清理测试环境 await cleanupTestDatabase(); }); beforeEach(async () => { // 每个测试前的准备 await page.goto('http://localhost:3000'); }); test('complete user registration flow', async () => { // 测试步骤 await page.click('#register-button'); await page.type('#username', 'newuser'); await page.type('#email', 'newuser@example.com'); await page.type('#password', 'password123'); await page.click('#submit-button'); // 验证结果 await expect(page).toMatch('Registration successful'); }); }); ``` **测试数据管理:** ```javascript // test-data.js module.exports = { validUser: { username: 'testuser', email: 'test@example.com', password: 'password123' }, invalidUser: { username: '', email: 'invalid-email', password: '123' } }; // 使用测试数据 const { validUser } = require('./test-data'); test('register with valid data', async () => { await page.type('#username', validUser.username); await page.type('#email', validUser.email); await page.type('#password', validUser.password); await page.click('#submit-button'); await expect(page).toMatch('Success'); }); ``` **6. 视觉回归测试** **使用 Percy:** ```bash npm install --save-dev @percy/puppeteer ``` ```javascript const percy = require('@percy/puppeteer'); describe('Visual regression tests', () => { beforeAll(async () => { await percy.start(); }); afterAll(async () => { await percy.stop(); }); test('homepage visual', async () => { await page.goto('http://localhost:3000'); await percy.snapshot(page, 'Homepage'); }); }); ``` **使用 BackstopJS:** ```javascript // backstop.config.js module.exports = { scenarios: [ { label: 'Homepage', url: 'http://localhost:3000', selectors: ['#header', '#main', '#footer'] } ], paths: { bitmaps_reference: 'backstop_data/bitmaps_reference', bitmaps_test: 'backstop_data/bitmaps_test', html_report: 'backstop_data/html_report' } }; ``` **7. 性能测试** **使用 Lighthouse:** ```bash npm install --save-dev lighthouse puppeteer ``` ```javascript const lighthouse = require('lighthouse'); const puppeteer = require('puppeteer'); describe('Performance tests', () => { test('page performance score', async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); const runnerResult = await lighthouse('http://localhost:3000', { port: new URL(browser.wsEndpoint()).port, output: 'json' }); const score = runnerResult.lhr.categories.performance.score * 100; expect(score).toBeGreaterThan(80); await browser.close(); }); }); ``` **8. 测试报告** **使用 Allure:** ```bash npm install --save-dev allure-commandline ``` ```javascript const allure = require('allure-js-commons'); describe('Allure reporting', () => { test('with Allure steps', async () => { const epic = new allure.Epic('User Management'); const feature = new allure.Feature('Registration'); const story = new allure.Story('User can register'); epic.addFeature(feature); feature.addStory(story); await page.click('#register-button'); await page.type('#username', 'testuser'); await page.click('#submit-button'); story.addStep('Click register button'); story.addStep('Enter username'); story.addStep('Submit form'); }); }); ``` **9. CI/CD 集成** **GitHub Actions 配置:** ```yaml name: Puppeteer Tests on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Setup Node.js uses: actions/setup-node@v2 with: node-version: '16' - name: Install dependencies run: npm ci - name: Install Chrome run: | sudo apt-get update sudo apt-get install -y chromium-browser - name: Run tests run: npm test env: CI: true ``` **Docker 配置:** ```dockerfile FROM node:16-alpine # 安装 Chrome RUN apk add --no-cache chromium # 安装依赖 COPY package*.json ./ RUN npm ci # 复制测试文件 COPY . . # 运行测试 CMD ["npm", "test"] ``` **10. 最佳实践总结** **1. 测试隔离:** ```javascript // 每个测试使用独立的上下文 beforeEach(async () => { const context = await browser.createIncognitoBrowserContext(); page = await context.newPage(); }); afterEach(async () => { await context.close(); }); ``` **2. 等待策略:** ```javascript // 使用明确的等待 await page.waitForSelector('.element', { visible: true }); // 避免硬编码延迟 // await page.waitForTimeout(5000); // 不推荐 ``` **3. 错误处理:** ```javascript test('with error handling', async () => { try { await page.click('#button'); } catch (error) { // 保存失败截图 await page.screenshot({ path: 'failure.png' }); throw error; } }); ``` **4. 测试数据清理:** ```javascript afterEach(async () => { // 清理测试数据 await cleanupTestData(); }); ``` **5. 并行测试:** ```javascript // Jest 配置 module.exports = { maxWorkers: 4, // 并行运行测试 preset: 'jest-puppeteer' }; ```
前端 · 2月19日 19:55
Puppeteer 如何处理动态网页和单页应用(SPA)?有哪些处理异步加载和路由变化的技巧?Puppeteer 在处理动态网页和单页应用(SPA)时具有独特的优势,可以执行 JavaScript、等待异步加载、处理路由变化等。 **1. 处理动态内容加载** **等待元素出现:** ```javascript const puppeteer = require('puppeteer'); async function scrapeDynamicContent() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); // 等待动态加载的元素 await page.waitForSelector('.dynamic-content', { visible: true }); const content = await page.$eval('.dynamic-content', el => el.textContent); console.log(content); await browser.close(); } scrapeDynamicContent(); ``` **等待特定条件:** ```javascript await page.waitForFunction(() => { return document.querySelectorAll('.item').length > 0; }); ``` **等待网络请求完成:** ```javascript await page.goto('https://example.com', { waitUntil: 'networkidle2' }); ``` **2. 处理无限滚动** **基本无限滚动:** ```javascript async function scrapeInfiniteScroll() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com/infinite-scroll'); const items = []; let previousHeight = 0; while (true) { // 滚动到底部 await page.evaluate(() => { window.scrollBy(0, window.innerHeight); }); // 等待新内容加载 await page.waitForTimeout(1000); // 检查是否有新内容 const currentHeight = await page.evaluate(() => document.body.scrollHeight); if (currentHeight === previousHeight) { break; // 没有新内容了 } previousHeight = currentHeight; // 收集数据 const newItems = await page.$$eval('.item', elements => { return elements.map(el => el.textContent); }); items.push(...newItems); } await browser.close(); return items; } ``` **优化的无限滚动:** ```javascript async function scrapeInfiniteScrollOptimized() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com/infinite-scroll'); const items = []; let noNewItemsCount = 0; while (noNewItemsCount < 3) { // 连续 3 次没有新内容就停止 const itemCountBefore = items.length; // 滚动到底部 await page.evaluate(() => { window.scrollTo(0, document.body.scrollHeight); }); // 等待加载指示器消失 try { await page.waitForSelector('.loading', { hidden: true, timeout: 3000 }); } catch (error) { // 加载指示器可能不存在 } // 收集新数据 const newItems = await page.$$eval('.item', elements => { return elements.map(el => el.textContent); }); if (newItems.length === itemCountBefore) { noNewItemsCount++; } else { noNewItemsCount = 0; items.push(...newItems); } } await browser.close(); return items; } ``` **3. 处理 SPA 路由** **监听路由变化:** ```javascript async function handleSPARoutes() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); // 监听路由变化 page.on('framenavigated', async (frame) => { console.log('Navigated to:', frame.url()); // 等待页面内容加载 await frame.waitForSelector('.content'); const title = await frame.$eval('.content', el => el.textContent); console.log('Page title:', title); }); // 点击导航链接 await page.click('#about-link'); await page.waitForTimeout(1000); await page.click('#contact-link'); await page.waitForTimeout(1000); await browser.close(); } ``` **等待特定路由:** ```javascript async function waitForRoute(page, path) { return new Promise((resolve) => { const checkRoute = async () => { const currentPath = await page.evaluate(() => window.location.pathname); if (currentPath === path) { resolve(); } else { setTimeout(checkRoute, 100); } }; checkRoute(); }); } // 使用 await page.click('#about-link'); await waitForRoute(page, '/about'); ``` **4. 处理 AJAX 请求** **等待特定 API 响应:** ```javascript async function waitForAPIResponse(page, urlPattern) { return new Promise((resolve) => { page.on('response', (response) => { if (response.url().includes(urlPattern)) { resolve(response); } }); }); } // 使用 const apiResponse = await Promise.all([ waitForAPIResponse(page, '/api/data'), page.click('#load-data-button') ]); const data = await apiResponse.json(); console.log(data); ``` **拦截和修改 API 请求:** ```javascript await page.setRequestInterception(true); page.on('request', (request) => { if (request.url().includes('/api/data')) { // 修改请求 request.continue({ headers: { ...request.headers(), 'Authorization': 'Bearer token' } }); } else { request.continue(); } }); ``` **5. 处理 WebSocket** **监听 WebSocket 消息:** ```javascript const client = await page.target().createCDPSession(); await client.send('Network.enable'); client.on('Network.webSocketFrameReceived', (params) => { console.log('WebSocket message:', params.response.payloadData); }); client.on('Network.webSocketFrameSent', (params) => { console.log('WebSocket sent:', params.response.payloadData); }); ``` **6. 处理客户端渲染** **等待客户端渲染完成:** ```javascript async function waitForClientRendering(page) { // 方法 1:等待特定元素 await page.waitForSelector('.rendered-content'); // 方法 2:等待渲染标志 await page.waitForFunction(() => { return window.__RENDER_COMPLETE__ === true; }); // 方法 3:等待网络空闲 await page.waitForFunction(() => { return performance.getEntriesByType('resource').length > 0; }); } ``` **处理 React/Vue 应用:** ```javascript async function scrapeReactApp() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com/react-app'); // 等待 React 应用挂载 await page.waitForSelector('#root'); // 等待数据加载完成 await page.waitForFunction(() => { return window.__INITIAL_STATE__?.loaded === true; }); // 与 React 应用交互 await page.click('#load-more-button'); await page.waitForSelector('.new-items'); const items = await page.$$eval('.item', elements => { return elements.map(el => el.textContent); }); await browser.close(); return items; } ``` **7. 实际应用场景** **场景 1:抓取社交媒体动态内容** ```javascript async function scrapeSocialMediaPosts(username) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(`https://social-media.com/${username}`); const posts = []; // 滚动加载更多帖子 while (posts.length < 50) { // 滚动到底部 await page.evaluate(() => { window.scrollBy(0, window.innerHeight); }); // 等待新帖子加载 await page.waitForTimeout(2000); // 收集帖子数据 const newPosts = await page.$$eval('.post', elements => { return elements.map(post => ({ id: post.dataset.id, content: post.querySelector('.content')?.textContent, likes: post.querySelector('.likes')?.textContent, timestamp: post.querySelector('.timestamp')?.textContent })); }); // 只添加新帖子 const newPostIds = new Set(posts.map(p => p.id)); const uniqueNewPosts = newPosts.filter(p => !newPostIds.has(p.id)); posts.push(...uniqueNewPosts); } await browser.close(); return posts; } ``` **场景 2:抓取电商网站商品列表** ```javascript async function scrapeEcommerceProducts(categoryUrl) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(categoryUrl); const products = []; while (true) { // 等待商品加载 await page.waitForSelector('.product-card'); // 收集当前页商品 const pageProducts = await page.$$eval('.product-card', cards => { return cards.map(card => ({ id: card.dataset.id, title: card.querySelector('.title')?.textContent, price: card.querySelector('.price')?.textContent, rating: card.querySelector('.rating')?.textContent })); }); products.push(...pageProducts); // 检查是否有下一页 const nextButton = await page.$('.next-page:not(.disabled)'); if (!nextButton) { break; } // 点击下一页 await nextButton.click(); await page.waitForTimeout(1000); } await browser.close(); return products; } ``` **场景 3:抓取实时数据更新** ```javascript async function scrapeRealTimeData(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); const dataUpdates = []; // 监听 DOM 变化 await page.evaluate(() => { const observer = new MutationObserver((mutations) => { mutations.forEach((mutation) => { if (mutation.type === 'childList') { window.__DATA_UPDATES__ = window.__DATA_UPDATES__ || []; window.__DATA_UPDATES__.push({ timestamp: Date.now(), addedNodes: mutation.addedNodes.length }); } }); }); observer.observe(document.body, { childList: true, subtree: true }); }); // 等待一段时间收集数据 await page.waitForTimeout(30000); // 获取收集的数据 const updates = await page.evaluate(() => { return window.__DATA_UPDATES__ || []; }); await browser.close(); return updates; } ``` **8. 最佳实践** **1. 使用适当的等待策略:** ```javascript // 优先使用 waitForSelector await page.waitForSelector('.element'); // 复杂条件使用 waitForFunction await page.waitForFunction(() => { return document.querySelectorAll('.item').length > 10; }); // 网络请求使用 waitForResponse await page.waitForResponse(response => response.url().includes('/api/data') ); ``` **2. 避免硬编码等待时间:** ```javascript // 不好的做法 await page.waitForTimeout(5000); // 好的做法 await page.waitForSelector('.loaded-content'); ``` **3. 处理加载失败:** ```javascript try { await page.waitForSelector('.content', { timeout: 10000 }); } catch (error) { console.log('Content failed to load, using fallback'); // 使用备用策略 } ``` **4. 优化性能:** ```javascript // 禁用不必要的资源 await page.setRequestInterception(true); page.on('request', (request) => { if (['image', 'font', 'media'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } }); ``` **5. 处理反爬虫:** ```javascript // 设置真实的用户代理 await page.setUserAgent('Mozilla/5.0 ...'); // 添加随机延迟 const randomDelay = () => Math.random() * 2000 + 1000; await page.waitForTimeout(randomDelay()); // 模拟人类行为 await page.evaluate(() => { window.scrollBy(0, Math.random() * 500); }); ```
前端 · 2月19日 19:49
Puppeteer 在实际项目中有哪些应用场景?请举例说明网页爬虫、自动化测试等具体实现。Puppeteer 在实际项目中有广泛的应用场景,从网页爬虫到自动化测试,从数据采集到性能监控。以下是一些典型的实际应用案例。 **1. 网页爬虫和数据采集** **案例 1:电商商品价格监控** ```javascript const puppeteer = require('puppeteer'); async function monitorProductPrices(productUrls) { const browser = await puppeteer.launch({ headless: 'new', args: ['--no-sandbox', '--disable-setuid-sandbox'] }); const results = []; for (const url of productUrls) { const page = await browser.newPage(); // 设置用户代理,避免被识别为爬虫 await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'); await page.goto(url, { waitUntil: 'networkidle2' }); // 等待价格元素加载 await page.waitForSelector('.price', { timeout: 5000 }); const productData = await page.evaluate(() => { return { title: document.querySelector('.product-title')?.textContent, price: document.querySelector('.price')?.textContent, availability: document.querySelector('.availability')?.textContent, rating: document.querySelector('.rating')?.textContent }; }); results.push({ url, ...productData, timestamp: new Date().toISOString() }); await page.close(); } await browser.close(); return results; } // 使用示例 const products = [ 'https://example.com/product/1', 'https://example.com/product/2' ]; monitorProductPrices(products).then(data => { console.log(JSON.stringify(data, null, 2)); }); ``` **案例 2:社交媒体数据抓取** ```javascript async function scrapeSocialMedia(username) { const browser = await puppeteer.launch({ headless: 'new' }); const page = await browser.newPage(); // 模拟登录 await page.goto('https://social-media.com/login'); await page.type('#username', 'your_username'); await page.type('#password', 'your_password'); await page.click('#login-button'); await page.waitForNavigation(); // 访问用户页面 await page.goto(`https://social-media.com/${username}`); // 滚动加载更多内容 while (true) { await page.evaluate(() => { window.scrollBy(0, window.innerHeight); }); try { await page.waitForSelector('.new-post', { timeout: 2000 }); } catch { break; } } // 抓取帖子数据 const posts = await page.evaluate(() => { return Array.from(document.querySelectorAll('.post')).map(post => ({ content: post.querySelector('.content')?.textContent, likes: post.querySelector('.likes')?.textContent, comments: post.querySelector('.comments')?.textContent, date: post.querySelector('.date')?.textContent })); }); await browser.close(); return posts; } ``` **2. 自动化测试** **案例 3:E2E 测试** ```javascript const { expect } = require('expect-puppeteer'); async function runE2ETest() { const browser = await puppeteer.launch({ headless: 'new', slowMo: 50 // 减慢操作速度,便于观察 }); const page = await browser.newPage(); try { // 测试用户注册流程 await page.goto('https://example.com/register'); // 填写注册表单 await page.type('#username', 'testuser'); await page.type('#email', 'test@example.com'); await page.type('#password', 'password123'); await page.type('#confirm-password', 'password123'); // 提交表单 await Promise.all([ page.waitForNavigation(), page.click('#register-button') ]); // 验证注册成功 await expect(page).toMatch('Welcome, testuser!'); // 测试登录流程 await page.click('#logout-button'); await page.waitForNavigation(); await page.type('#login-email', 'test@example.com'); await page.type('#login-password', 'password123'); await page.click('#login-button'); await page.waitForNavigation(); // 验证登录成功 await expect(page).toMatch('Welcome back!'); console.log('E2E test passed!'); } catch (error) { console.error('E2E test failed:', error); // 保存失败截图 await page.screenshot({ path: 'test-failure.png' }); } finally { await browser.close(); } } runE2ETest(); ``` **案例 4:视觉回归测试** ```javascript const fs = require('fs'); const pixelmatch = require('pixelmatch'); const { PNG } = require('pngjs'); async function visualRegressionTest(url, baselinePath) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // 截取当前页面 const screenshot = await page.screenshot(); await browser.close(); // 如果没有基线图片,保存当前截图作为基线 if (!fs.existsSync(baselinePath)) { fs.writeFileSync(baselinePath, screenshot); console.log('Baseline image created'); return true; } // 读取基线图片 const baseline = PNG.sync.read(fs.readFileSync(baselinePath)); const current = PNG.sync.read(screenshot); // 比较图片差异 const diff = new PNG({ width: baseline.width, height: baseline.height }); const numDiffPixels = pixelmatch( baseline.data, current.data, diff.data, baseline.width, baseline.height, { threshold: 0.1 } ); // 保存差异图片 fs.writeFileSync('diff.png', PNG.sync.write(diff)); const totalPixels = baseline.width * baseline.height; const diffPercentage = (numDiffPixels / totalPixels) * 100; console.log(`Difference: ${diffPercentage.toFixed(2)}%`); // 如果差异超过阈值,测试失败 if (diffPercentage > 0.5) { console.log('Visual regression detected!'); return false; } console.log('Visual regression test passed!'); return true; } visualRegressionTest('https://example.com', 'baseline.png'); ``` **3. PDF 生成和文档处理** **案例 5:动态报表生成** ```javascript async function generateReport(data, outputPath) { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 生成 HTML 报表 const html = ` <!DOCTYPE html> <html> <head> <style> body { font-family: Arial, sans-serif; padding: 40px; } h1 { color: #333; } table { width: 100%; border-collapse: collapse; margin-top: 20px; } th, td { border: 1px solid #ddd; padding: 12px; text-align: left; } th { background-color: #f2f2f2; } .summary { margin-top: 30px; padding: 20px; background-color: #f9f9f9; } </style> </head> <body> <h1>销售报表</h1> <p>生成时间: ${new Date().toLocaleString()}</p> <table> <thead> <tr> <th>产品</th> <th>数量</th> <th>单价</th> <th>总价</th> </tr> </thead> <tbody> ${data.map(item => ` <tr> <td>${item.product}</td> <td>${item.quantity}</td> <td>$${item.price.toFixed(2)}</td> <td>$${(item.quantity * item.price).toFixed(2)}</td> </tr> `).join('')} </tbody> </table> <div class="summary"> <h2>总计: $${data.reduce((sum, item) => sum + item.quantity * item.price, 0).toFixed(2)}</h2> </div> </body> </html> `; await page.setContent(html); // 生成 PDF await page.pdf({ path: outputPath, format: 'A4', printBackground: true, margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' } }); await browser.close(); console.log(`Report generated: ${outputPath}`); } // 使用示例 const salesData = [ { product: '产品 A', quantity: 10, price: 99.99 }, { product: '产品 B', quantity: 5, price: 149.99 }, { product: '产品 C', quantity: 8, price: 79.99 } ]; generateReport(salesData, 'sales-report.pdf'); ``` **案例 6:发票批量生成** ```javascript async function generateInvoices(invoices) { const browser = await puppeteer.launch(); const page = await browser.newPage(); for (const invoice of invoices) { const html = ` <!DOCTYPE html> <html> <head> <style> body { font-family: Arial, sans-serif; padding: 40px; } .header { text-align: center; margin-bottom: 40px; } .invoice-info { margin-bottom: 30px; } table { width: 100%; border-collapse: collapse; } th, td { border: 1px solid #ddd; padding: 10px; text-align: left; } th { background-color: #f2f2f2; } .total { text-align: right; font-weight: bold; margin-top: 20px; } </style> </head> <body> <div class="header"> <h1>发票</h1> <p>发票号: ${invoice.number}</p> </div> <div class="invoice-info"> <p>日期: ${invoice.date}</p> <p>客户: ${invoice.customer}</p> </div> <table> <thead> <tr> <th>项目</th> <th>数量</th> <th>单价</th> <th>总价</th> </tr> </thead> <tbody> ${invoice.items.map(item => ` <tr> <td>${item.name}</td> <td>${item.quantity}</td> <td>$${item.price}</td> <td>$${item.quantity * item.price}</td> </tr> `).join('')} </tbody> </table> <div class="total"> 总计: $${invoice.total} </div> </body> </html> `; await page.setContent(html); await page.pdf({ path: `invoices/invoice-${invoice.number}.pdf`, format: 'A4', printBackground: true }); console.log(`Generated invoice: ${invoice.number}`); } await browser.close(); } // 使用示例 const invoices = [ { number: 'INV-001', date: '2024-01-15', customer: '客户 A', items: [ { name: '服务 A', quantity: 1, price: 500 }, { name: '服务 B', quantity: 2, price: 300 } ], total: 1100 } ]; generateInvoices(invoices); ``` **4. 性能监控和分析** **案例 7:页面性能分析** ```javascript async function analyzePagePerformance(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 启用性能监控 const client = await page.target().createCDPSession(); await client.send('Performance.enable'); await client.send('Network.enable'); // 记录开始时间 const startTime = Date.now(); await page.goto(url, { waitUntil: 'networkidle2' }); const loadTime = Date.now() - startTime; // 获取性能指标 const metrics = await client.send('Performance.getMetrics'); // 获取关键性能指标 const performanceData = { loadTime, domContentLoaded: await page.evaluate(() => performance.timing.domContentLoadedEventEnd - performance.timing.navigationStart ), firstPaint: await page.evaluate(() => performance.getEntriesByType('paint')[0]?.startTime ), firstContentfulPaint: await page.evaluate(() => performance.getEntriesByType('paint')[1]?.startTime ), resources: metrics.metrics }; // 生成性能报告 console.log('Performance Report:'); console.log(`Load Time: ${performanceData.loadTime}ms`); console.log(`DOM Content Loaded: ${performanceData.domContentLoaded}ms`); console.log(`First Paint: ${performanceData.firstPaint}ms`); console.log(`First Contentful Paint: ${performanceData.firstContentfulPaint}ms`); await browser.close(); return performanceData; } analyzePagePerformance('https://example.com'); ``` **5. SEO 工具** **案例 8:SEO 检查工具** ```javascript async function seoAudit(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); const seoData = await page.evaluate(() => { const issues = []; const warnings = []; // 检查标题 const title = document.querySelector('title'); if (!title) { issues.push('Missing title tag'); } else if (title.textContent.length > 60) { warnings.push('Title too long (> 60 characters)'); } // 检查描述 const description = document.querySelector('meta[name="description"]'); if (!description) { issues.push('Missing meta description'); } else if (description.content.length > 160) { warnings.push('Meta description too long (> 160 characters)'); } // 检查 H1 标签 const h1Tags = document.querySelectorAll('h1'); if (h1Tags.length === 0) { issues.push('Missing H1 tag'); } else if (h1Tags.length > 1) { warnings.push('Multiple H1 tags found'); } // 检查图片 alt 属性 const images = document.querySelectorAll('img'); let missingAlt = 0; images.forEach(img => { if (!img.alt) missingAlt++; }); if (missingAlt > 0) { warnings.push(`${missingAlt} images missing alt attributes`); } // 检查链接 const links = document.querySelectorAll('a[href]'); let brokenLinks = 0; links.forEach(link => { if (link.getAttribute('href').startsWith('#')) brokenLinks++; }); return { title: title?.textContent, description: description?.content, h1Count: h1Tags.length, imageCount: images.length, linkCount: links.length, issues, warnings }; }); console.log('SEO Audit Results:'); console.log(JSON.stringify(seoData, null, 2)); await browser.close(); return seoData; } seoAudit('https://example.com'); ``` **6. 最佳实践总结** **1. 错误处理:** ```javascript try { // 操作代码 } catch (error) { console.error('Error:', error); // 保存错误截图 await page.screenshot({ path: 'error.png' }); } finally { await browser.close(); } ``` **2. 资源管理:** ```javascript // 及时清理资源 await page.close(); await browser.close(); ``` **3. 性能优化:** ```javascript // 禁用不必要的资源 await page.setRequestInterception(true); page.on('request', (request) => { if (['image', 'font'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } }); ``` **4. 反爬虫策略:** ```javascript // 设置真实的用户代理 await page.setUserAgent('Mozilla/5.0 ...'); // 添加延迟 await new Promise(resolve => setTimeout(resolve, 1000)); // 使用代理 const browser = await puppeteer.launch({ args: ['--proxy-server=http://proxy.example.com:8080'] }); ```
前端 · 2月19日 19:48
Puppeteer 如何实现页面交互和表单操作?有哪些常用的 API 和最佳实践?Puppeteer 提供了丰富的页面交互和表单操作功能,可以模拟用户的真实操作行为,这对于自动化测试和网页爬虫非常重要。 **1. 基本页面操作** **导航到页面:** ```javascript // 基本导航 await page.goto('https://example.com'); // 等待网络空闲 await page.goto('https://example.com', { waitUntil: 'networkidle2' }); // 设置超时时间 await page.goto('https://example.com', { timeout: 30000 }); // 等待特定条件 await page.goto('https://example.com', { waitUntil: ['load', 'domcontentloaded'] }); ``` **刷新页面:** ```javascript await page.reload(); await page.reload({ waitUntil: 'networkidle2' }); ``` **前进和后退:** ```javascript await page.goBack(); await page.goForward(); ``` **2. 元素选择** Puppeteer 支持多种选择器方式。 **使用 $ 选择单个元素:** ```javascript // 通过 CSS 选择器 const element = await page.$('#my-id'); const element = await page.$('.my-class'); const element = await page.$('div > p'); // 通过 XPath const element = await page.$x('//div[@class="my-class"]'); ``` **使用 $$ 选择多个元素:** ```javascript // 选择所有匹配的元素 const elements = await page.$$('.item'); console.log(elements.length); // 元素数量 // 遍历元素 for (const element of elements) { const text = await element.evaluate(el => el.textContent); console.log(text); } ``` **使用 $$eval 批量获取数据:** ```javascript // 获取所有元素的文本 const texts = await page.$$eval('.item', elements => { return elements.map(el => el.textContent); }); // 获取所有元素的属性 const hrefs = await page.$$eval('a', elements => { return elements.map(el => el.href); }); ``` **3. 点击操作** **基本点击:** ```javascript await page.click('#button'); await page.click('.submit-btn'); ``` **带选项的点击:** ```javascript await page.click('#button', { button: 'left', // 'left', 'right', 'middle' clickCount: 1, // 点击次数 delay: 100, // 点击延迟(毫秒) offset: { // 点击位置偏移 x: 10, y: 10 } }); ``` **双击:** ```javascript await page.click('#button', { clickCount: 2 }); ``` **右键点击:** ```javascript await page.click('#button', { button: 'right' }); ``` **等待元素可点击:** ```javascript await page.waitForSelector('#button', { visible: true }); await page.click('#button'); ``` **4. 文本输入** **基本输入:** ```javascript await page.type('#input', 'Hello World'); ``` **带选项的输入:** ```javascript await page.type('#input', 'Hello World', { delay: 100, // 每个字符的延迟(毫秒) clear: true // 输入前清空输入框 }); ``` **模拟真实打字速度:** ```javascript await page.type('#input', 'Hello World', { delay: 50 }); ``` **清空输入框:** ```javascript await page.click('#input'); await page.keyboard.down('Control'); await page.keyboard.press('A'); await page.keyboard.up('Control'); await page.keyboard.press('Backspace'); ``` **5. 键盘操作** **基本按键:** ```javascript await page.keyboard.press('Enter'); await page.keyboard.press('Tab'); await page.keyboard.press('Escape'); await page.keyboard.press('Backspace'); ``` **组合键:** ```javascript // Ctrl+C await page.keyboard.down('Control'); await page.keyboard.press('C'); await page.keyboard.up('Control'); // Ctrl+A (全选) await page.keyboard.down('Control'); await page.keyboard.press('A'); await page.keyboard.up('Control'); // Ctrl+V (粘贴) await page.keyboard.down('Control'); await page.keyboard.press('V'); await page.keyboard.up('Control'); ``` **特殊键:** ```javascript await page.keyboard.press('ArrowUp'); await page.keyboard.press('ArrowDown'); await page.keyboard.press('ArrowLeft'); await page.keyboard.press('ArrowRight'); await page.keyboard.press('PageUp'); await page.keyboard.press('PageDown'); await page.keyboard.press('Home'); await page.keyboard.press('End'); ``` **6. 鼠标操作** **移动鼠标:** ```javascript await page.mouse.move(100, 100); await page.mouse.move(100, 100, { steps: 10 }); // 平滑移动 ``` **点击鼠标:** ```javascript await page.mouse.click(100, 100); await page.mouse.click(100, 100, { button: 'left', clickCount: 1 }); ``` **按下和释放鼠标:** ```javascript await page.mouse.down(); await page.mouse.up(); // 拖拽操作 await page.mouse.down({ x: 100, y: 100 }); await page.mouse.move(200, 200, { steps: 10 }); await page.mouse.up(); ``` **7. 表单操作** **填写表单:** ```javascript // 文本输入 await page.type('#name', 'John Doe'); await page.type('#email', 'john@example.com'); // 选择下拉框 await page.selectOption('#country', 'CN'); await page.selectOption('#country', ['CN', 'US']); // 多选 // 复选框 await page.click('#checkbox'); const isChecked = await page.$eval('#checkbox', el => el.checked); // 单选框 await page.click('#radio-male'); // 文件上传 await page.setInputFiles('#file-upload', '/path/to/file.pdf'); await page.setInputFiles('#file-upload', ['/file1.pdf', '/file2.pdf']); ``` **提交表单:** ```javascript // 点击提交按钮 await page.click('#submit-button'); // 使用表单提交 await page.evaluate(() => { document.querySelector('form').submit(); }); ``` **8. 滚动操作** **滚动到页面底部:** ```javascript await page.evaluate(() => { window.scrollTo(0, document.body.scrollHeight); }); ``` **滚动到特定元素:** ```javascript await page.evaluate(() => { document.querySelector('#target').scrollIntoView(); }); ``` **平滑滚动:** ```javascript await page.evaluate(() => { window.scrollTo({ top: 1000, behavior: 'smooth' }); }); ``` **滚动指定距离:** ```javascript await page.evaluate(() => { window.scrollBy(0, 500); }); ``` **9. 等待元素** **等待元素出现:** ```javascript await page.waitForSelector('.result'); await page.waitForSelector('.result', { visible: true }); await page.waitForSelector('.result', { hidden: true }); ``` **等待 XPath:** ```javascript await page.waitForXPath('//div[@class="result"]'); ``` **等待函数:** ```javascript await page.waitForFunction(() => { return document.querySelectorAll('.item').length > 5; }); ``` **等待导航:** ```javascript await Promise.all([ page.waitForNavigation(), page.click('#link') ]); ``` **10. 获取元素信息** **获取文本内容:** ```javascript const text = await page.$eval('.title', el => el.textContent); ``` **获取属性:** ```javascript const href = await page.$eval('a', el => el.href); const id = await page.$eval('div', el => el.id); ``` **获取多个元素信息:** ```javascript const texts = await page.$$eval('.item', elements => { return elements.map(el => el.textContent); }); ``` **检查元素是否存在:** ```javascript const exists = await page.$('.element') !== null; ``` **检查元素是否可见:** ```javascript const isVisible = await page.$eval('.element', el => { return el.offsetParent !== null; }); ``` **11. 实际应用场景** **场景 1:登录表单填写** ```javascript async function login(url, username, password) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); // 填写登录表单 await page.type('#username', username); await page.type('#password', password); // 点击登录按钮 await Promise.all([ page.waitForNavigation(), page.click('#login-button') ]); // 验证登录成功 const isLoggedIn = await page.$('.user-profile') !== null; await browser.close(); return isLoggedIn; } login('https://example.com/login', 'user@example.com', 'password'); ``` **场景 2:搜索功能测试** ```javascript async function testSearch(url, query) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); // 输入搜索关键词 await page.type('#search-input', query); // 提交搜索 await Promise.all([ page.waitForNavigation(), page.keyboard.press('Enter') ]); // 等待搜索结果 await page.waitForSelector('.search-result'); // 获取结果数量 const resultCount = await page.$$eval('.search-result', results => { return results.length; }); console.log(`Found ${resultCount} results for "${query}"`); await browser.close(); return resultCount; } testSearch('https://example.com', 'puppeteer'); ``` **场景 3:分页数据抓取** ```javascript async function scrapePaginatedData(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const allData = []; let hasNextPage = true; let pageNum = 1; while (hasNextPage) { await page.goto(`${url}?page=${pageNum}`); await page.waitForSelector('.item'); // 抓取当前页数据 const pageData = await page.$$eval('.item', items => { return items.map(item => ({ title: item.querySelector('.title').textContent, price: item.querySelector('.price').textContent })); }); allData.push(...pageData); console.log(`Scraped page ${pageNum}: ${pageData.length} items`); // 检查是否有下一页 hasNextPage = await page.$('.next-page:not(.disabled)') !== null; pageNum++; } await browser.close(); return allData; } scrapePaginatedData('https://example.com/products'); ``` **场景 4:动态内容加载** ```javascript async function scrapeDynamicContent(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); // 等待初始内容加载 await page.waitForSelector('.content'); // 滚动加载更多内容 while (true) { // 滚动到底部 await page.evaluate(() => { window.scrollTo(0, document.body.scrollHeight); }); // 等待新内容加载 try { await page.waitForSelector('.new-content', { timeout: 3000 }); } catch (error) { break; // 没有新内容了 } } // 获取所有内容 const allContent = await page.$$eval('.content-item', items => { return items.map(item => item.textContent); }); await browser.close(); return allContent; } scrapeDynamicContent('https://example.com/infinite-scroll'); ``` **12. 最佳实践** **1. 使用等待机制:** ```javascript // 好的做法 await page.waitForSelector('.button', { visible: true }); await page.click('.button'); // 不好的做法 await page.click('.button'); // 可能失败 ``` **2. 处理动态内容:** ```javascript // 等待网络空闲 await page.goto(url, { waitUntil: 'networkidle2' }); // 等待特定元素 await page.waitForSelector('.loaded-content'); ``` **3. 错误处理:** ```javascript try { await page.click('#button'); } catch (error) { console.error('Click failed:', error); // 重试逻辑 } ``` **4. 性能优化:** ```javascript // 禁用不必要的资源 await page.setRequestInterception(true); page.on('request', (request) => { if (['image', 'font', 'media'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } }); ``` **5. 清理资源:** ```javascript try { // 操作代码 } finally { await browser.close(); } ```
前端 · 2月19日 19:48
Puppeteer 和 Selenium 有什么区别?在什么场景下应该选择 Puppeteer 而不是 Selenium?Puppeteer 和 Selenium 都是流行的浏览器自动化工具,但它们在设计理念、实现方式和使用场景上有显著差异。 **1. 架构差异** **Puppeteer:** - 基于 Chrome DevTools Protocol (CDP) - 直接与浏览器通信,无需中间层 - 专为 Chrome/Chromium 设计 - 使用 WebSocket 与浏览器建立连接 **Selenium:** - 基于 WebDriver 协议 - 通过 WebDriver 服务器与浏览器通信 - 支持多种浏览器(Chrome、Firefox、Safari、Edge 等) - 需要安装浏览器驱动程序 **2. 性能对比** **Puppeteer:** ```javascript // 启动速度快 const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); // 快速加载 ``` **Selenium:** ```javascript // 启动较慢 const driver = await new Builder() .forBrowser('chrome') .build(); await driver.get('https://example.com'); // 加载较慢 ``` **性能指标对比:** | 指标 | Puppeteer | Selenium | |------|-----------|----------| | 启动时间 | 快(1-2秒) | 慢(3-5秒) | | 执行速度 | 快 | 中等 | | 内存占用 | 较低 | 较高 | | 网络请求 | 直接通信 | 通过驱动 | **3. API 设计** **Puppeteer API:** ```javascript // 简洁直观的 API await page.click('#button'); await page.type('#input', 'text'); await page.waitForSelector('.result'); const text = await page.$eval('.title', el => el.textContent); ``` **Selenium API:** ```javascript // 相对复杂的 API await driver.findElement(By.id('button')).click(); await driver.findElement(By.id('input')).sendKeys('text'); await driver.wait(until.elementLocated(By.css('.result'))); const text = await driver.findElement(By.css('.title')).getText(); ``` **4. 浏览器支持** **Puppeteer:** - Chrome/Chromium(主要支持) - Firefox(实验性支持,通过 puppeteer-firefox) - 其他浏览器支持有限 **Selenium:** - Chrome - Firefox - Safari - Edge - Opera - Internet Explorer - 支持几乎所有主流浏览器 **5. 功能特性对比** **Puppeteer 特有功能:** ```javascript // 1. 网络拦截 await page.setRequestInterception(true); page.on('request', request => { if (request.resourceType() === 'image') { request.abort(); } else { request.continue(); } }); // 2. 性能追踪 const client = await page.target().createCDPSession(); await client.send('Performance.enable'); const metrics = await client.send('Performance.getMetrics'); // 3. 文件下载 const [download] = await Promise.all([ page.waitForEvent('download'), page.click('#download-button') ]); await download.saveAs('/path/to/save'); // 4. 设备模拟 const devices = puppeteer.devices; const iPhone = devices['iPhone 12']; await page.emulate(iPhone); // 5. 地理位置模拟 await page.setGeolocation({ latitude: 35.6895, longitude: 139.6917 }); ``` **Selenium 特有功能:** ```javascript // 1. 多浏览器支持 const driver = await new Builder() .forBrowser('firefox') .build(); // 2. 分布式测试(Selenium Grid) // 可以在多台机器上并行运行测试 // 3. 移动设备测试(Appium) // 支持原生移动应用测试 // 4. 高级等待机制 await driver.wait( until.titleIs('Expected Title'), 5000, 'Title did not match' ); // 5. Actions API(复杂交互) await driver.actions() .move({ origin: element }) .press() .move({ origin: targetElement }) .release() .perform(); ``` **6. 使用场景** **Puppeteer 适用场景:** - 网页爬虫和数据抓取 - 生成截图和 PDF - 性能测试和监控 - CI/CD 自动化测试 - SPA(单页应用)测试 - 需要网络拦截的场景 **Selenium 适用场景:** - 跨浏览器兼容性测试 - 大型企业级测试框架 - 分布式测试环境 - 需要支持多种浏览器的项目 - 移动应用测试(配合 Appium) - 传统 Web 应用测试 **7. 学习曲线** **Puppeteer:** - API 简洁直观 - 文档清晰易懂 - 学习曲线较平缓 - 适合初学者 **Selenium:** - API 相对复杂 - 需要理解 WebDriver 概念 - 学习曲线较陡峭 - 需要更多配置 **8. 社区和生态系统** **Puppeteer:** - Google 官方维护 - 活跃的 GitHub 社区 - 丰富的插件生态 - 持续更新和改进 **Selenium:** - 开源社区维护 - 成熟的生态系统 - 大量第三方工具和集成 - 广泛的企业应用 **9. 实际代码对比** **任务:登录并获取用户信息** **Puppeteer 实现:** ```javascript const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com/login'); // 填写表单 await page.type('#username', 'user@example.com'); await page.type('#password', 'password123'); // 提交表单并等待导航 await Promise.all([ page.waitForNavigation(), page.click('#login-button') ]); // 获取用户信息 const userInfo = await page.evaluate(() => { return { name: document.querySelector('.user-name').textContent, email: document.querySelector('.user-email').textContent }; }); console.log(userInfo); await browser.close(); })(); ``` **Selenium 实现:** ```javascript const { Builder, By, until } = require('selenium-webdriver'); (async () => { const driver = await new Builder() .forBrowser('chrome') .build(); await driver.get('https://example.com/login'); // 填写表单 await driver.findElement(By.id('username')).sendKeys('user@example.com'); await driver.findElement(By.id('password')).sendKeys('password123'); // 提交表单并等待导航 await Promise.all([ driver.wait(until.titleContains('Dashboard'), 5000), driver.findElement(By.id('login-button')).click() ]); // 获取用户信息 const userInfo = { name: await driver.findElement(By.css('.user-name')).getText(), email: await driver.findElement(By.css('.user-email')).getText() }; console.log(userInfo); await driver.quit(); })(); ``` **10. 选择建议** **选择 Puppeteer 如果:** - 主要使用 Chrome/Chromium - 需要高性能和快速执行 - 需要网络拦截或性能分析 - 项目规模较小或中等 - 团队熟悉 Node.js - 需要生成截图或 PDF **选择 Selenium 如果:** - 需要支持多种浏览器 - 需要跨浏览器兼容性测试 - 项目规模较大或企业级 - 需要分布式测试环境 - 需要测试移动应用 - 团队已有 Selenium 经验 **11. 混合使用策略** 在某些项目中,可以结合两者的优势: ```javascript // 使用 Puppeteer 进行快速开发和测试 const puppeteer = require('puppeteer'); // 使用 Selenium 进行跨浏览器验证 const { Builder, By } = require('selenium-webdriver'); async function testWithPuppeteer() { // 快速测试主要功能 } async function testWithSelenium() { // 跨浏览器兼容性测试 } ``` **总结:** Puppeteer 和 Selenium 各有优势,选择哪个工具取决于项目需求、团队技能和测试场景。Puppeteer 更适合现代 Web 应用和快速开发,而 Selenium 更适合需要跨浏览器支持的企业级测试框架。
前端 · 2月19日 19:47
Puppeteer 如何使用 Chrome DevTools Protocol (CDP) 进行高级调试和性能分析?Puppeteer 提供了丰富的 Chrome DevTools Protocol (CDP) 功能,允许开发者访问浏览器底层的调试和性能分析能力。 **1. CDP 基础** **创建 CDP 会话:** ```javascript const client = await page.target().createCDPSession(); ``` **启用 CDP 域:** ```javascript await client.send('Performance.enable'); await client.send('Network.enable'); await client.send('Runtime.enable'); ``` **发送 CDP 命令:** ```javascript const result = await client.send('Performance.getMetrics'); console.log(result); ``` **监听 CDP 事件:** ```javascript client.on('Network.requestWillBeSent', (params) => { console.log('Request:', params.request.url); }); ``` **2. 性能监控** **启用性能监控:** ```javascript const client = await page.target().createCDPSession(); await client.send('Performance.enable'); ``` **获取性能指标:** ```javascript const metrics = await client.send('Performance.getMetrics'); console.log('Performance Metrics:', metrics.metrics); ``` **关键性能指标:** ```javascript const metrics = await client.send('Performance.getMetrics'); const metricMap = {}; metrics.metrics.forEach(m => metricMap[m.name] = m.value); console.log({ Timestamp: metricMap.Timestamp, Documents: metricMap.Documents, Frames: metricMap.Frames, JSEventListeners: metricMap.JSEventListeners, Nodes: metricMap.Nodes, LayoutCount: metricMap.LayoutCount, RecalcStyleCount: metricMap.RecalcStyleCount, LayoutDuration: metricMap.LayoutDuration, RecalcStyleDuration: metricMap.RecalcStyleDuration, ScriptDuration: metricMap.ScriptDuration, TaskDuration: metricMap.TaskDuration }); ``` **性能追踪:** ```javascript // 开始追踪 await client.send('Performance.enable'); await client.send('Tracing.start', { traceConfig: { includedCategories: ['devtools.timeline', 'blink.user_timing'] } }); // 执行操作 await page.goto('https://example.com'); // 停止追踪 const traceData = await client.send('Tracing.stop'); ``` **3. 网络监控** **启用网络监控:** ```javascript const client = await page.target().createCDPSession(); await client.send('Network.enable'); ``` **监控网络请求:** ```javascript client.on('Network.requestWillBeSent', (params) => { console.log('Request:', { url: params.request.url, method: params.request.method, type: params.type }); }); ``` **监控网络响应:** ```javascript client.on('Network.responseReceived', (params) => { console.log('Response:', { url: params.response.url, status: params.response.status, mimeType: params.response.mimeType }); }); ``` **获取请求体:** ```javascript client.on('Network.requestWillBeSent', async (params) => { if (params.request.postData) { console.log('Request body:', params.request.postData); } }); ``` **获取响应体:** ```javascript client.on('Network.responseReceived', async (params) => { const responseBody = await client.send('Network.getResponseBody', { requestId: params.requestId }); console.log('Response body:', responseBody.body); }); ``` **4. 运行时调试** **启用运行时监控:** ```javascript const client = await page.target().createCDPSession(); await client.send('Runtime.enable'); ``` **执行 JavaScript:** ```javascript const result = await client.send('Runtime.evaluate', { expression: 'document.title' }); console.log('Result:', result.result.value); ``` **获取控制台日志:** ```javascript client.on('Runtime.consoleAPICalled', (params) => { console.log('Console:', params.type, params.args); }); ``` **监听异常:** ```javascript client.on('Runtime.exceptionThrown', (params) => { console.error('Exception:', params.exceptionDetails); }); ``` **5. DOM 监控** **启用 DOM 监控:** ```javascript const client = await page.target().createCDPSession(); await client.send('DOM.enable'); ``` **获取文档根节点:** ```javascript const root = await client.send('DOM.getDocument'); console.log('Root node:', root.root); ``` **查询节点:** ```javascript const result = await client.send('DOM.querySelector', { nodeId: root.root.nodeId, selector: '.my-element' }); console.log('Node:', result.nodeId); ``` **获取节点属性:** ```javascript const attributes = await client.send('DOM.getAttributes', { nodeId: result.nodeId }); console.log('Attributes:', attributes.attributes); ``` **6. Page 监控** **启用 Page 监控:** ```javascript const client = await page.target().createCDPSession(); await client.send('Page.enable'); ``` **监听页面加载:** ```javascript client.on('Page.loadEventFired', () => { console.log('Page loaded'); }); ``` **监听导航:** ```javascript client.on('Page.frameNavigated', (params) => { console.log('Navigated to:', params.frame.url); }); ``` **获取页面资源树:** ```javascript const resourceTree = await client.send('Page.getResourceTree'); console.log('Resource tree:', resourceTree); ``` **7. 实际应用场景** **场景 1:性能分析工具** ```javascript async function analyzePerformance(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const client = await page.target().createCDPSession(); // 启用性能监控 await client.send('Performance.enable'); await client.send('Network.enable'); const startTime = Date.now(); await page.goto(url, { waitUntil: 'networkidle2' }); const loadTime = Date.now() - startTime; // 获取性能指标 const metrics = await client.send('Performance.getMetrics'); const metricMap = {}; metrics.metrics.forEach(m => metricMap[m.name] = m.value); // 收集网络数据 const networkData = []; client.on('Network.requestWillBeSent', (params) => { networkData.push({ url: params.request.url, method: params.request.method, timestamp: params.timestamp }); }); const report = { url, loadTime, metrics: { layoutDuration: metricMap.LayoutDuration, recalcStyleDuration: metricMap.RecalcStyleDuration, scriptDuration: metricMap.ScriptDuration, taskDuration: metricMap.TaskDuration }, networkRequests: networkData.length }; await browser.close(); return report; } analyzePerformance('https://example.com').then(console.log); ``` **场景 2:网络请求分析** ```javascript async function analyzeNetworkRequests(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Network.enable'); const requests = []; client.on('Network.requestWillBeSent', (params) => { requests.push({ requestId: params.requestId, url: params.request.url, method: params.request.method, type: params.type, timestamp: params.timestamp }); }); client.on('Network.responseReceived', (params) => { const request = requests.find(r => r.requestId === params.requestId); if (request) { request.status = params.response.status; request.mimeType = params.response.mimeType; request.size = params.response.encodedDataLength; } }); await page.goto(url, { waitUntil: 'networkidle2' }); // 分析请求 const analysis = { totalRequests: requests.length, byType: {}, byStatus: {}, totalSize: 0 }; requests.forEach(req => { // 按类型统计 if (!analysis.byType[req.type]) { analysis.byType[req.type] = { count: 0, size: 0 }; } analysis.byType[req.type].count++; analysis.byType[req.type].size += req.size || 0; // 按状态码统计 if (!analysis.byStatus[req.status]) { analysis.byStatus[req.status] = 0; } analysis.byStatus[req.status]++; analysis.totalSize += req.size || 0; }); await browser.close(); return analysis; } analyzeNetworkRequests('https://example.com').then(console.log); ``` **场景 3:内存分析** ```javascript async function analyzeMemory(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Runtime.enable'); await client.send('HeapProfiler.enable'); await page.goto(url, { waitUntil: 'networkidle2' }); // 获取堆快照 const heapSnapshot = await client.send('HeapProfiler.takeHeapSnapshot', { reportProgress: false }); // 获取内存使用情况 const memoryMetrics = await client.send('Runtime.getHeapUsage'); const report = { totalSize: memoryMetrics.totalSize, usedSize: memoryMetrics.usedSize, heapSnapshot: heapSnapshot }; await browser.close(); return report; } analyzeMemory('https://example.com').then(console.log); ``` **场景 4:JavaScript 执行分析** ```javascript async function analyzeJavaScript(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Runtime.enable'); await client.send('Debugger.enable'); const consoleLogs = []; const exceptions = []; client.on('Runtime.consoleAPICalled', (params) => { consoleLogs.push({ type: params.type, args: params.args.map(arg => arg.value) }); }); client.on('Runtime.exceptionThrown', (params) => { exceptions.push({ message: params.exceptionDetails.exception?.description, stackTrace: params.exceptionDetails.stackTrace }); }); await page.goto(url, { waitUntil: 'networkidle2' }); const report = { consoleLogs, exceptions, hasErrors: exceptions.length > 0 }; await browser.close(); return report; } analyzeJavaScript('https://example.com').then(console.log); ``` **8. CDP 高级功能** **覆盖代码:** ```javascript const client = await page.target().createCDPSession(); await client.send('DOM.enable'); await client.send('CSS.enable'); // 启用代码覆盖 await client.send('Profiler.enable'); await client.send('Profiler.startPreciseCoverage', { callCount: true, detailed: true }); // 执行操作 await page.goto('https://example.com'); // 获取覆盖数据 const coverage = await client.send('Profiler.takePreciseCoverage'); console.log('Coverage:', coverage.result); ``` **监控长任务:** ```javascript const client = await page.target().createCDPSession(); await client.send('Performance.enable'); client.on('Performance.metrics', (params) => { params.metrics.forEach(metric => { if (metric.name === 'TaskDuration' && metric.value > 50) { console.warn('Long task detected:', metric.value, 'ms'); } }); }); ``` **监控布局抖动:** ```javascript const client = await page.target().createCDPSession(); await client.send('Performance.enable'); const layoutShifts = []; client.on('Performance.metrics', (params) => { params.metrics.forEach(metric => { if (metric.name === 'LayoutShift') { layoutShifts.push(metric.value); } }); }); // 计算累积布局偏移 const cls = layoutShifts.reduce((sum, shift) => sum + shift, 0); console.log('Cumulative Layout Shift:', cls); ``` **9. 最佳实践** **1. 及时禁用 CDP 域:** ```javascript try { await client.send('Performance.enable'); // 操作 } finally { await client.send('Performance.disable'); } ``` **2. 批量获取数据:** ```javascript // 一次性获取多个指标 const [metrics, networkData] = await Promise.all([ client.send('Performance.getMetrics'), client.send('Network.getResponseBody', { requestId: 'xxx' }) ]); ``` **3. 使用事件过滤:** ```javascript client.on('Network.requestWillBeSent', (params) => { // 只处理特定请求 if (params.request.url.includes('/api/')) { console.log('API Request:', params.request.url); } }); ``` **4. 错误处理:** ```javascript try { await client.send('Performance.getMetrics'); } catch (error) { console.error('CDP error:', error); // 降级处理 } ```
前端 · 2月19日 19:40
Puppeteer 中有哪些等待机制?如何正确使用它们来处理异步操作?Puppeteer 提供了多种等待机制来处理异步操作和页面加载,确保在执行操作前页面状态已就绪。 **1. page.waitForNavigation()** 等待页面导航完成,适用于点击链接、提交表单等会触发页面跳转的操作。 ```javascript await Promise.all([ page.waitForNavigation(), page.click('#submit-button') ]); ``` **参数选项:** - `waitUntil`: 'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2' - `timeout`: 超时时间(毫秒) **2. page.waitForSelector(selector)** 等待指定选择器出现在页面中。 ```javascript await page.waitForSelector('.result-item', { visible: true }); ``` **参数选项:** - `visible`: 等待元素可见 - `hidden`: 等待元素隐藏 - `timeout`: 超时时间 **3. page.waitForXPath(xpath)** 等待 XPath 选择器匹配的元素。 ```javascript await page.waitForXPath('//div[@class="content"]'); ``` **4. page.waitForFunction(pageFunction, ...args)** 等待自定义函数返回真值,最灵活的等待方式。 ```javascript await page.waitForFunction( () => document.querySelectorAll('.item').length > 5 ); // 带参数 await page.waitForFunction( (count) => document.querySelectorAll('.item').length >= count, {}, 10 ); ``` **5. page.waitForTimeout(milliseconds)** 等待指定时间(已废弃,建议使用 setTimeout)。 ```javascript // 旧方法(已废弃) await page.waitForTimeout(1000); // 新方法 await new Promise(resolve => setTimeout(resolve, 1000)); ``` **6. page.waitForResponse(urlOrPredicate)** 等待特定的网络响应。 ```javascript // 等待特定 URL 的响应 await page.waitForResponse('https://api.example.com/data'); // 使用谓词函数 await page.waitForResponse(response => response.url().includes('/api/') && response.status() === 200 ); ``` **7. page.waitForRequest(urlOrPredicate)** 等待特定的网络请求。 ```javascript await page.waitForRequest(request => request.url().includes('/api/data') ); ``` **8. page.waitForFrame(frame)** 等待指定的 iframe 加载完成。 ```javascript const frame = await page.waitForFrame('iframe-name'); ``` **最佳实践:** **1. 选择合适的等待方法:** - 导航操作 → `waitForNavigation` - 元素操作 → `waitForSelector` - 复杂条件 → `waitForFunction` - API 调用 → `waitForResponse` **2. 设置合理的超时时间:** ```javascript await page.waitForSelector('.element', { timeout: 5000 // 5 秒超时 }); ``` **3. 使用 Promise.all 并行等待:** ```javascript await Promise.all([ page.waitForNavigation(), page.click('#link'), page.waitForSelector('.loaded') ]); ``` **4. 处理超时异常:** ```javascript try { await page.waitForSelector('.element', { timeout: 3000 }); } catch (error) { console.log('Element not found within timeout'); } ``` **5. 优化等待策略:** ```javascript // 等待网络空闲(推荐) await page.waitForNavigation({ waitUntil: 'networkidle2' }); // 等待特定元素可见 await page.waitForSelector('.element', { visible: true }); ``` **常见问题解决:** **问题 1:元素存在但不可见** ```javascript // 解决方案:等待元素可见 await page.waitForSelector('.element', { visible: true }); ``` **问题 2:动态加载内容** ```javascript // 解决方案:使用 waitForFunction 检查内容 await page.waitForFunction(() => document.querySelectorAll('.item').length > 0 ); ``` **问题 3:SPA 路由变化** ```javascript // 解决方案:等待 URL 变化 await page.waitForFunction(() => window.location.pathname === '/new-page' ); ```
前端 · 2月19日 19:40
Puppeteer 如何进行错误处理和调试?有哪些常用的调试技巧和工具?Puppeteer 提供了多种错误处理和调试技巧,帮助开发者快速定位和解决问题,提高开发效率。 **1. 基本错误处理** **try-catch 模式:** ```javascript const puppeteer = require('puppeteer'); async function safeExecution() { const browser = await puppeteer.launch(); const page = await browser.newPage(); try { await page.goto('https://example.com'); await page.click('#button'); } catch (error) { console.error('Error occurred:', error.message); // 错误处理逻辑 } finally { await browser.close(); } } safeExecution(); ``` **超时处理:** ```javascript try { await page.goto('https://example.com', { timeout: 5000 }); } catch (error) { if (error.name === 'TimeoutError') { console.log('Page load timeout'); } } ``` **2. 调试模式** **启用调试模式:** ```javascript // 方法 1:使用 headless: false const browser = await puppeteer.launch({ headless: false, slowMo: 100 // 减慢操作速度 }); // 方法 2:使用 devtools const browser = await puppeteer.launch({ headless: false, devtools: true }); ``` **使用 slowMo:** ```javascript const browser = await puppeteer.launch({ headless: false, slowMo: 50 // 每个操作延迟 50ms }); ``` **3. 日志记录** **控制台日志:** ```javascript page.on('console', msg => { console.log('Browser console:', msg.text()); }); // 捕获不同类型的日志 page.on('console', msg => { const type = msg.type(); const text = msg.text(); if (type === 'error') { console.error('Browser error:', text); } else if (type === 'warning') { console.warn('Browser warning:', text); } else { console.log('Browser log:', text); } }); ``` **页面错误日志:** ```javascript page.on('pageerror', error => { console.error('Page error:', error.message); }); ``` **请求失败日志:** ```javascript page.on('requestfailed', request => { console.log('Request failed:', request.url()); console.log('Failure:', request.failure()); }); ``` **4. 截图和视频录制** **错误时截图:** ```javascript async function withErrorScreenshot(page, operation) { try { await operation(); } catch (error) { const timestamp = new Date().toISOString().replace(/[:.]/g, '-'); await page.screenshot({ path: `error-${timestamp}.png`, fullPage: true }); throw error; } } // 使用示例 await withErrorScreenshot(page, async () => { await page.goto('https://example.com'); await page.click('#button'); }); ``` **视频录制:** ```javascript const { spawn } = require('child_process'); async function recordVideo(page, outputPath, operation) { // 使用 ffmpeg 录制屏幕 const ffmpeg = spawn('ffmpeg', [ '-f', 'x11grab', '-r', '30', '-s', '1920x1080', '-i', ':99', '-c:v', 'libx264', '-preset', 'ultrafast', outputPath ]); try { await operation(); } finally { ffmpeg.kill('SIGINT'); } } ``` **5. 网络调试** **监控网络请求:** ```javascript page.on('request', request => { console.log('Request:', request.url()); }); page.on('response', response => { console.log('Response:', response.url(), response.status()); }); page.on('requestfinished', request => { console.log('Request finished:', request.url()); }); ``` **捕获请求和响应数据:** ```javascript const requests = []; page.on('request', request => { requests.push({ url: request.url(), method: request.method(), headers: request.headers() }); }); page.on('response', async response => { const request = requests.find(r => r.url === response.url()); if (request) { request.status = response.status(); request.headers = response.headers(); try { request.body = await response.text(); } catch (error) { request.body = null; } } }); ``` **6. 性能追踪** **启用性能追踪:** ```javascript const client = await page.target().createCDPSession(); await client.send('Performance.enable'); await client.send('Network.enable'); // 获取性能指标 const metrics = await client.send('Performance.getMetrics'); console.log('Performance metrics:', metrics); ``` **追踪时间线:** ```javascript await page.tracing.start({ path: 'trace.json' }); // 执行操作 await page.goto('https://example.com'); await page.tracing.stop(); ``` **7. 元素调试** **高亮元素:** ```javascript async function highlightElement(page, selector) { await page.evaluate(selector => { const element = document.querySelector(selector); if (element) { element.style.border = '3px solid red'; element.style.backgroundColor = 'yellow'; } }, selector); } ``` **检查元素状态:** ```javascript async function checkElement(page, selector) { const isVisible = await page.isVisible(selector); const isEnabled = await page.isDisabled(selector); const isClickable = await page.isClickable(selector); console.log('Element state:', { selector, isVisible, isEnabled, isClickable }); } ``` **获取元素位置:** ```javascript const position = await page.evaluate(selector => { const element = document.querySelector(selector); if (element) { const rect = element.getBoundingClientRect(); return { x: rect.left, y: rect.top, width: rect.width, height: rect.height }; } }, '.element'); ``` **8. 调试工具函数** **等待并调试:** ```javascript async function waitForAndDebug(page, selector, options = {}) { console.log(`Waiting for selector: ${selector}`); try { await page.waitForSelector(selector, { timeout: options.timeout || 30000, visible: options.visible !== false }); console.log(`Found selector: ${selector}`); } catch (error) { console.error(`Failed to find selector: ${selector}`); await page.screenshot({ path: 'debug-failed.png' }); throw error; } } ``` **点击并调试:** ```javascript async function clickAndDebug(page, selector) { console.log(`Attempting to click: ${selector}`); try { // 检查元素是否存在 const element = await page.$(selector); if (!element) { throw new Error(`Element not found: ${selector}`); } // 检查元素是否可见 const isVisible = await element.isIntersectingViewport(); if (!isVisible) { console.warn('Element is not visible, scrolling to it'); await element.scrollIntoView(); } await element.click(); console.log(`Successfully clicked: ${selector}`); } catch (error) { console.error(`Failed to click: ${selector}`, error); await page.screenshot({ path: 'debug-click-failed.png' }); throw error; } } ``` **9. 常见错误及解决方案** **错误 1:元素未找到** ```javascript // 问题:元素选择器错误 await page.click('.wrong-selector'); // 解决方案:使用正确的选择器 await page.click('.correct-selector'); // 或者等待元素出现 await page.waitForSelector('.correct-selector'); await page.click('.correct-selector'); ``` **错误 2:元素不可点击** ```javascript // 问题:元素被遮挡或不可见 await page.click('.hidden-button'); // 解决方案:滚动到元素 await page.evaluate(selector => { document.querySelector(selector).scrollIntoView(); }, '.hidden-button'); await page.click('.hidden-button'); ``` **错误 3:超时错误** ```javascript // 问题:页面加载超时 await page.goto('https://slow-website.com'); // 解决方案:增加超时时间 await page.goto('https://slow-website.com', { timeout: 60000 }); // 或使用更宽松的等待条件 await page.goto('https://slow-website.com', { waitUntil: 'domcontentloaded' }); ``` **错误 4:内存泄漏** ```javascript // 问题:未关闭浏览器实例 const browser = await puppeteer.launch(); // 忘记关闭 // 解决方案:使用 finally 确保关闭 const browser = await puppeteer.launch(); try { // 操作 } finally { await browser.close(); } ``` **10. 调试最佳实践** **1. 使用描述性日志:** ```javascript console.log(`[INFO] Navigating to ${url}`); console.log(`[DEBUG] Found ${elements.length} elements`); console.log(`[ERROR] Failed to click button: ${error.message}`); ``` **2. 保存调试信息:** ```javascript const debugInfo = { url: page.url(), timestamp: new Date().toISOString(), screenshot: await page.screenshot({ encoding: 'base64' }), html: await page.content(), cookies: await page.cookies() }; require('fs').writeFileSync('debug.json', JSON.stringify(debugInfo, null, 2)); ``` **3. 使用条件断点:** ```javascript await page.evaluate(() => { debugger; // 在浏览器中暂停 }); ``` **4. 分步调试:** ```javascript // 使用 slowMo 减慢操作 const browser = await puppeteer.launch({ slowMo: 100 }); // 或在关键步骤添加延迟 await new Promise(resolve => setTimeout(resolve, 1000)); ``` **5. 使用调试器:** ```javascript // 在代码中添加 debugger debugger; // 使用 Node.js 调试器运行 node --inspect-brk script.js ``` **11. 测试和验证** **单元测试示例:** ```javascript const assert = require('assert'); async function testPageLoad() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); const title = await page.title(); assert.strictEqual(title, 'Example Domain'); await browser.close(); } testPageLoad().catch(console.error); ``` **集成测试示例:** ```javascript async function testUserFlow() { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 测试登录流程 await page.goto('https://example.com/login'); await page.type('#username', 'testuser'); await page.type('#password', 'password'); await page.click('#login-button'); // 验证登录成功 await page.waitForSelector('.user-profile'); const isLoggedIn = await page.$('.user-profile') !== null; assert(isLoggedIn, 'Login failed'); await browser.close(); } testUserFlow().catch(console.error); ```
前端 · 2月19日 19:40
Puppeteer 如何管理 Cookie 和存储?如何实现会话持久化和多账户管理?Puppeteer 提供了强大的 Cookie 和存储管理功能,可以模拟真实的用户会话、保持登录状态、管理本地存储等。 **1. Cookie 管理** **获取所有 Cookie:** ```javascript const cookies = await page.cookies(); console.log(cookies); ``` **获取特定 URL 的 Cookie:** ```javascript const cookies = await page.cookies('https://example.com'); ``` **设置 Cookie:** ```javascript await page.setCookie({ name: 'session_id', value: 'abc123', domain: '.example.com', path: '/', expires: Math.floor(Date.now() / 1000) + 3600, // 1 小时后过期 httpOnly: true, secure: true, sameSite: 'Lax' }); ``` **设置多个 Cookie:** ```javascript await page.setCookie( { name: 'cookie1', value: 'value1', domain: '.example.com' }, { name: 'cookie2', value: 'value2', domain: '.example.com' } ); ``` **删除 Cookie:** ```javascript // 删除指定 Cookie await page.deleteCookie({ name: 'session_id', domain: '.example.com' }); // 删除所有 Cookie const cookies = await page.cookies(); await page.deleteCookie(...cookies); ``` **清除所有 Cookie:** ```javascript await page.evaluate(() => { document.cookie.split(";").forEach(c => { document.cookie = c.replace(/^ +/, "").replace(/=.*/, "=;expires=" + new Date().toUTCString() + ";path=/"); }); }); ``` **2. LocalStorage 管理** **获取 LocalStorage 数据:** ```javascript const localStorageData = await page.evaluate(() => { const data = {}; for (let i = 0; i < localStorage.length; i++) { const key = localStorage.key(i); data[key] = localStorage.getItem(key); } return data; }); ``` **设置 LocalStorage 数据:** ```javascript await page.evaluate(() => { localStorage.setItem('user_id', '12345'); localStorage.setItem('preferences', JSON.stringify({ theme: 'dark' })); }); ``` **获取特定 LocalStorage 项:** ```javascript const userId = await page.evaluate(() => { return localStorage.getItem('user_id'); }); ``` **删除 LocalStorage 项:** ```javascript await page.evaluate(() => { localStorage.removeItem('user_id'); }); ``` **清除所有 LocalStorage:** ```javascript await page.evaluate(() => { localStorage.clear(); }); ``` **3. SessionStorage 管理** **获取 SessionStorage 数据:** ```javascript const sessionStorageData = await page.evaluate(() => { const data = {}; for (let i = 0; i < sessionStorage.length; i++) { const key = sessionStorage.key(i); data[key] = sessionStorage.getItem(key); } return data; }); ``` **设置 SessionStorage 数据:** ```javascript await page.evaluate(() => { sessionStorage.setItem('temp_data', 'temporary_value'); }); ``` **清除所有 SessionStorage:** ```javascript await page.evaluate(() => { sessionStorage.clear(); }); ``` **4. IndexedDB 管理** **获取 IndexedDB 数据:** ```javascript const indexedDBData = await page.evaluate(async () => { return new Promise((resolve, reject) => { const request = indexedDB.open('myDatabase', 1); request.onsuccess = (event) => { const db = event.target.result; const transaction = db.transaction(['myStore'], 'readonly'); const store = transaction.objectStore('myStore'); const getAllRequest = store.getAll(); getAllRequest.onsuccess = () => { resolve(getAllRequest.result); }; getAllRequest.onerror = () => { reject(getAllRequest.error); }; }; request.onerror = () => { reject(request.error); }; }); }); ``` **5. 浏览器上下文和隔离** **使用 Incognito 上下文:** ```javascript const context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(); // 在隔离环境中操作 await page.goto('https://example.com'); // 关闭上下文,清除所有数据 await context.close(); ``` **多个隔离上下文:** ```javascript // 创建多个隔离的上下文 const context1 = await browser.createIncognitoBrowserContext(); const context2 = await browser.createIncognitoBrowserContext(); const page1 = await context1.newPage(); const page2 = await context2.newPage(); // 两个上下文的 Cookie 和存储完全隔离 ``` **6. 会话持久化** **保存会话状态:** ```javascript async function saveSession(page, filePath) { const cookies = await page.cookies(); const localStorage = await page.evaluate(() => { const data = {}; for (let i = 0; i < localStorage.length; i++) { const key = localStorage.key(i); data[key] = localStorage.getItem(key); } return data; }); const session = { cookies, localStorage, url: page.url(), timestamp: Date.now() }; const fs = require('fs'); fs.writeFileSync(filePath, JSON.stringify(session, null, 2)); } ``` **恢复会话状态:** ```javascript async function restoreSession(page, filePath) { const fs = require('fs'); const session = JSON.parse(fs.readFileSync(filePath, 'utf8')); // 恢复 Cookie await page.setCookie(...session.cookies); // 恢复 LocalStorage await page.evaluate((data) => { for (const [key, value] of Object.entries(data)) { localStorage.setItem(key, value); } }, session.localStorage); // 导航到之前的 URL await page.goto(session.url); } ``` **7. 实际应用场景** **场景 1:保持登录状态** ```javascript async function loginAndSaveSession() { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 登录 await page.goto('https://example.com/login'); await page.type('#username', 'user@example.com'); await page.type('#password', 'password'); await page.click('#login-button'); await page.waitForNavigation(); // 保存会话 await saveSession(page, 'session.json'); await browser.close(); } async function useSavedSession() { const browser = await puppeteer.launch(); const page = await browser.newPage(); // 恢复会话 await restoreSession(page, 'session.json'); // 直接访问需要登录的页面 await page.goto('https://example.com/dashboard'); // 验证是否已登录 const isLoggedIn = await page.$('.user-profile') !== null; console.log('Is logged in:', isLoggedIn); await browser.close(); } ``` **场景 2:多账户管理** ```javascript async function manageMultipleAccounts(accounts) { const browser = await puppeteer.launch(); for (const account of accounts) { // 为每个账户创建隔离的上下文 const context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(); // 登录账户 await page.goto('https://example.com/login'); await page.type('#username', account.username); await page.type('#password', account.password); await page.click('#login-button'); await page.waitForNavigation(); // 执行账户操作 await page.goto('https://example.com/dashboard'); const data = await page.evaluate(() => { return document.querySelector('.user-data').textContent; }); console.log(`Account ${account.username}: ${data}`); // 关闭上下文,清除数据 await context.close(); } await browser.close(); } manageMultipleAccounts([ { username: 'user1@example.com', password: 'pass1' }, { username: 'user2@example.com', password: 'pass2' } ]); ``` **场景 3:A/B 测试** ```javascript async function abTesting(url, variants) { const browser = await puppeteer.launch(); for (const variant of variants) { const context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(); // 设置 A/B 测试 Cookie await page.setCookie({ name: 'ab_test_variant', value: variant.id, domain: new URL(url).hostname }); await page.goto(url); // 收集数据 const data = await page.evaluate(() => { return { title: document.title, content: document.querySelector('.content')?.textContent }; }); console.log(`Variant ${variant.id}:`, data); await context.close(); } await browser.close(); } abTesting('https://example.com', [ { id: 'A' }, { id: 'B' } ]); ``` **场景 4:购物车持久化** ```javascript async function saveShoppingCart(page, userId) { const cartData = await page.evaluate(() => { return JSON.parse(localStorage.getItem('cart') || '[]'); }); const fs = require('fs'); const filePath = `carts/${userId}.json`; fs.writeFileSync(filePath, JSON.stringify(cartData, null, 2)); } async function restoreShoppingCart(page, userId) { const fs = require('fs'); const filePath = `carts/${userId}.json`; if (fs.existsSync(filePath)) { const cartData = JSON.parse(fs.readFileSync(filePath, 'utf8')); await page.evaluate((data) => { localStorage.setItem('cart', JSON.stringify(data)); }, cartData); } } ``` **8. 安全注意事项** **1. 敏感数据保护:** ```javascript // 不要在代码中硬编码敏感信息 // 使用环境变量 const password = process.env.PASSWORD; // 不要将包含敏感信息的会话文件提交到版本控制 // 将 session.json 添加到 .gitignore ``` **2. Cookie 安全:** ```javascript // 设置安全的 Cookie 属性 await page.setCookie({ name: 'session', value: 'value', httpOnly: true, // 防止 XSS 攻击 secure: true, // 仅通过 HTTPS 传输 sameSite: 'Strict' // 防止 CSRF 攻击 }); ``` **3. 会话过期处理:** ```javascript async function checkSessionValidity(page) { const cookies = await page.cookies(); const sessionCookie = cookies.find(c => c.name === 'session_id'); if (!sessionCookie || sessionCookie.expires * 1000 < Date.now()) { // 会话已过期,重新登录 await relogin(page); } } ``` **9. 最佳实践** **1. 使用隔离上下文:** ```javascript // 为每个用户或会话创建隔离的上下文 const context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(); // 操作完成后关闭上下文 await context.close(); ``` **2. 定期清理:** ```javascript // 定期清理过期的 Cookie 和存储 async function cleanupStorage(page) { const cookies = await page.cookies(); const validCookies = cookies.filter(c => !c.expires || c.expires * 1000 > Date.now() ); await page.deleteCookie(...cookies); await page.setCookie(...validCookies); } ``` **3. 错误处理:** ```javascript try { await page.setCookie(cookie); } catch (error) { console.error('Failed to set cookie:', error); // 处理错误 } ``` **4. 性能优化:** ```javascript // 批量操作 Cookie await page.setCookie(...cookies); // 避免频繁的存储操作 const data = await page.evaluate(() => { // 一次性获取所有需要的数据 return { localStorage: { ...localStorage }, sessionStorage: { ...sessionStorage } }; }); ```
前端 · 2月19日 19:39