Puppeteer
Puppeteer 是一个 Node.js 库,它提供了一个高级 API 来通过 DevTools 协议控制无头 Chrome 或 Chromium。它还可以配置为使用完整(非无头)Chrome 或 Chromium。

查看更多相关内容
Cheerio 和 Puppeteer 有什么区别?如何选择使用?Cheerio 和 Puppeteer 都是 Node.js 中用于处理网页的工具,但它们的设计目标和使用场景有显著差异:
## 1. 核心区别
| 特性 | Cheerio | Puppeteer |
|------|---------|-----------|
| **类型** | HTML 解析器 | 浏览器自动化工具 |
| **JavaScript 执行** | 不支持 | 完全支持 |
| **动态内容** | 无法处理 | 完全支持 |
| **性能** | 极快 | 较慢 |
| **资源消耗** | 低 | 高 |
| **API** | jQuery 风格 | 浏览器 DevTools 协议 |
| **使用场景** | 静态 HTML 解析 | 动态网页、截图、PDF |
## 2. Cheerio 的特点
### 优势
- **轻量快速**:核心代码只有几百行,解析速度极快
- **简单易用**:jQuery 风格的 API,学习成本低
- **低资源消耗**:不需要启动浏览器,内存占用少
- **适合批量处理**:可以快速处理大量静态页面
### 局限性
- **无法执行 JavaScript**:只能解析静态 HTML
- **无法处理动态内容**:无法获取通过 JS 动态加载的数据
- **无法处理复杂交互**:不支持点击、滚动等用户操作
- **无法截图或生成 PDF**:没有可视化能力
### 适用场景
```javascript
// 适合:静态网页数据提取
const cheerio = require('cheerio');
const axios = require('axios');
async function scrapeStaticSite() {
const response = await axios.get('https://example.com');
const $ = cheerio.load(response.data);
return {
title: $('title').text(),
links: $('a').map((i, el) => $(el).attr('href')).get()
};
}
```
## 3. Puppeteer 的特点
### 优势
- **完整浏览器环境**:使用真实的 Chrome/Chromium
- **JavaScript 执行**:可以执行页面中的所有 JavaScript
- **动态内容支持**:可以获取 AJAX 加载的数据
- **交互能力**:支持点击、输入、滚动等操作
- **可视化功能**:支持截图、生成 PDF
- **网络拦截**:可以监控和修改网络请求
### 局限性
- **资源消耗大**:需要启动完整的浏览器实例
- **速度较慢**:相比 Cheerio 慢很多
- **复杂度高**:API 相对复杂,学习成本高
- **部署困难**:在某些服务器环境部署较复杂
### 适用场景
```javascript
// 适合:动态网页、需要交互的场景
const puppeteer = require('puppeteer');
async function scrapeDynamicSite() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
// 等待动态内容加载
await page.waitForSelector('.dynamic-content');
const data = await page.evaluate(() => {
return {
title: document.title,
content: document.querySelector('.dynamic-content').textContent
};
});
await browser.close();
return data;
}
```
## 4. 性能对比
```javascript
// Cheerio - 快速解析
const cheerio = require('cheerio');
async function cheerioBenchmark() {
const start = Date.now();
const $ = cheerio.load(htmlString);
const items = $('.item').map((i, el) => $(el).text()).get();
const time = Date.now() - start;
console.log(`Cheerio: ${time}ms, ${items.length} items`);
// 结果:通常 < 10ms
}
// Puppeteer - 完整浏览器
const puppeteer = require('puppeteer');
async function puppeteerBenchmark() {
const start = Date.now();
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlString);
const items = await page.$$eval('.item', elements =>
elements.map(el => el.textContent)
);
await browser.close();
const time = Date.now() - start;
console.log(`Puppeteer: ${time}ms, ${items.length} items`);
// 结果:通常 500-2000ms
}
```
## 5. 选择建议
### 使用 Cheerio 的场景
- 网站内容是静态 HTML
- 需要处理大量页面
- 对性能要求高
- 只需要提取数据,不需要交互
- 服务器资源有限
### 使用 Puppeteer 的场景
- 网站使用 JavaScript 动态加载内容
- 需要模拟用户操作(点击、滚动等)
- 需要截图或生成 PDF
- 需要处理复杂的 SPA 应用
- 需要监控网络请求
### 混合使用场景
```javascript
// 先用 Puppeteer 获取动态内容,再用 Cheerio 解析
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
async function hybridScrape() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 使用 Puppeteer 加载动态页面
await page.goto('https://example.com/dynamic');
await page.waitForSelector('.content');
// 获取 HTML
const html = await page.content();
await browser.close();
// 使用 Cheerio 快速解析
const $ = cheerio.load(html);
const data = $('.item').map((i, el) => ({
title: $(el).find('.title').text(),
content: $(el).find('.content').text()
})).get();
return data;
}
```
## 6. 实际应用示例
### Cheerio - 抓取静态博客
```javascript
async function scrapeBlog() {
const response = await axios.get('https://blog.example.com');
const $ = cheerio.load(response.data);
return $('.post').map((i, el) => ({
title: $(el).find('h2').text(),
date: $(el).find('.date').text(),
excerpt: $(el).find('.excerpt').text()
})).get();
}
```
### Puppeteer - 抓取动态电商网站
```javascript
async function scrapeShop() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://shop.example.com');
// 滚动加载更多商品
for (let i = 0; i < 5; i++) {
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(1000);
}
const products = await page.$$eval('.product', items =>
items.map(item => ({
name: item.querySelector('.name').textContent,
price: item.querySelector('.price').textContent
}))
);
await browser.close();
return products;
}
```
## 总结
- **Cheerio**:适合静态页面、高性能需求、批量处理
- **Puppeteer**:适合动态页面、需要交互、可视化需求
- **混合使用**:先用 Puppeteer 加载动态内容,再用 Cheerio 解析,可以获得最佳的性能和功能平衡
服务端 · 2月22日 14:30
Puppeteer 如何与测试框架集成?有哪些 E2E 测试和 CI/CD 集成的最佳实践?Puppeteer 可以与各种测试框架集成,实现端到端测试、单元测试和集成测试。以下是常见的集成方式和最佳实践。
**1. 与 Jest 集成**
**安装依赖:**
```bash
npm install --save-dev puppeteer jest jest-puppeteer @types/puppeteer
```
**配置 Jest:**
```javascript
// jest.config.js
module.exports = {
preset: 'jest-puppeteer',
testMatch: ['**/*.test.js'],
setupFilesAfterEnv: ['./jest.setup.js']
};
```
**设置文件:**
```javascript
// jest.setup.js
beforeEach(async () => {
await page.goto('http://localhost:3000');
});
```
**编写测试:**
```javascript
describe('Puppeteer with Jest', () => {
test('page title', async () => {
await expect(page.title()).resolves.toMatch('My App');
});
test('user can login', async () => {
await page.type('#username', 'testuser');
await page.type('#password', 'password');
await page.click('#login-button');
await expect(page).toMatch('Welcome');
});
});
```
**2. 与 Mocha 集成**
**安装依赖:**
```bash
npm install --save-dev puppeteer mocha chai
```
**配置 Mocha:**
```javascript
// mocha.config.js
module.exports = {
timeout: 10000,
require: ['mocha-setup.js']
};
```
**设置文件:**
```javascript
// mocha-setup.js
const puppeteer = require('puppeteer');
const { expect } = require('chai');
let browser;
let page;
before(async () => {
browser = await puppeteer.launch();
page = await browser.newPage();
});
after(async () => {
await browser.close();
});
beforeEach(async () => {
await page.goto('http://localhost:3000');
});
global.page = page;
global.expect = expect;
```
**编写测试:**
```javascript
describe('Puppeteer with Mocha', () => {
it('should display correct title', async () => {
const title = await page.title();
expect(title).to.equal('My App');
});
it('should allow user to login', async () => {
await page.type('#username', 'testuser');
await page.type('#password', 'password');
await page.click('#login-button');
const welcomeText = await page.$eval('.welcome', el => el.textContent);
expect(welcomeText).to.include('Welcome');
});
});
```
**3. 与 Playwright 集成**
**安装依赖:**
```bash
npm install --save-dev @playwright/test
```
**配置 Playwright:**
```javascript
// playwright.config.js
module.exports = {
testDir: './tests',
use: {
headless: true,
screenshot: 'only-on-failure'
}
};
```
**编写测试:**
```javascript
const { test, expect } = require('@playwright/test');
test.describe('Puppeteer migration', () => {
test('basic navigation', async ({ page }) => {
await page.goto('http://localhost:3000');
await expect(page).toHaveTitle('My App');
});
test('form submission', async ({ page }) => {
await page.goto('http://localhost:3000');
await page.fill('#username', 'testuser');
await page.fill('#password', 'password');
await page.click('#login-button');
await expect(page.locator('.welcome')).toBeVisible();
});
});
```
**4. 与 Cypress 集成**
**安装 Cypress:**
```bash
npm install --save-dev cypress
```
**配置 Cypress:**
```javascript
// cypress.config.js
const { defineConfig } = require('cypress');
module.exports = defineConfig({
e2e: {
baseUrl: 'http://localhost:3000',
setupNodeEvents(on, config) {
on('task', {
puppeteer({ url, action }) {
return require('./puppeteer-task')(url, action);
}
});
}
}
});
```
**Puppeteer 任务:**
```javascript
// puppeteer-task.js
const puppeteer = require('puppeteer');
module.exports = async (url, action) => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
let result;
if (action === 'screenshot') {
result = await page.screenshot({ encoding: 'base64' });
} else if (action === 'pdf') {
result = await page.pdf({ encoding: 'base64' });
}
await browser.close();
return result;
};
```
**Cypress 测试:**
```javascript
// cypress/e2e/puppeteer.spec.js
describe('Puppeteer integration', () => {
it('should take screenshot with Puppeteer', () => {
cy.task('puppeteer', {
url: 'http://localhost:3000',
action: 'screenshot'
}).then(screenshot => {
cy.log('Screenshot taken');
});
});
});
```
**5. 端到端测试最佳实践**
**测试结构:**
```javascript
describe('User Flow', () => {
before(async () => {
// 设置测试环境
await setupTestDatabase();
});
after(async () => {
// 清理测试环境
await cleanupTestDatabase();
});
beforeEach(async () => {
// 每个测试前的准备
await page.goto('http://localhost:3000');
});
test('complete user registration flow', async () => {
// 测试步骤
await page.click('#register-button');
await page.type('#username', 'newuser');
await page.type('#email', 'newuser@example.com');
await page.type('#password', 'password123');
await page.click('#submit-button');
// 验证结果
await expect(page).toMatch('Registration successful');
});
});
```
**测试数据管理:**
```javascript
// test-data.js
module.exports = {
validUser: {
username: 'testuser',
email: 'test@example.com',
password: 'password123'
},
invalidUser: {
username: '',
email: 'invalid-email',
password: '123'
}
};
// 使用测试数据
const { validUser } = require('./test-data');
test('register with valid data', async () => {
await page.type('#username', validUser.username);
await page.type('#email', validUser.email);
await page.type('#password', validUser.password);
await page.click('#submit-button');
await expect(page).toMatch('Success');
});
```
**6. 视觉回归测试**
**使用 Percy:**
```bash
npm install --save-dev @percy/puppeteer
```
```javascript
const percy = require('@percy/puppeteer');
describe('Visual regression tests', () => {
beforeAll(async () => {
await percy.start();
});
afterAll(async () => {
await percy.stop();
});
test('homepage visual', async () => {
await page.goto('http://localhost:3000');
await percy.snapshot(page, 'Homepage');
});
});
```
**使用 BackstopJS:**
```javascript
// backstop.config.js
module.exports = {
scenarios: [
{
label: 'Homepage',
url: 'http://localhost:3000',
selectors: ['#header', '#main', '#footer']
}
],
paths: {
bitmaps_reference: 'backstop_data/bitmaps_reference',
bitmaps_test: 'backstop_data/bitmaps_test',
html_report: 'backstop_data/html_report'
}
};
```
**7. 性能测试**
**使用 Lighthouse:**
```bash
npm install --save-dev lighthouse puppeteer
```
```javascript
const lighthouse = require('lighthouse');
const puppeteer = require('puppeteer');
describe('Performance tests', () => {
test('page performance score', async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const runnerResult = await lighthouse('http://localhost:3000', {
port: new URL(browser.wsEndpoint()).port,
output: 'json'
});
const score = runnerResult.lhr.categories.performance.score * 100;
expect(score).toBeGreaterThan(80);
await browser.close();
});
});
```
**8. 测试报告**
**使用 Allure:**
```bash
npm install --save-dev allure-commandline
```
```javascript
const allure = require('allure-js-commons');
describe('Allure reporting', () => {
test('with Allure steps', async () => {
const epic = new allure.Epic('User Management');
const feature = new allure.Feature('Registration');
const story = new allure.Story('User can register');
epic.addFeature(feature);
feature.addStory(story);
await page.click('#register-button');
await page.type('#username', 'testuser');
await page.click('#submit-button');
story.addStep('Click register button');
story.addStep('Enter username');
story.addStep('Submit form');
});
});
```
**9. CI/CD 集成**
**GitHub Actions 配置:**
```yaml
name: Puppeteer Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Node.js
uses: actions/setup-node@v2
with:
node-version: '16'
- name: Install dependencies
run: npm ci
- name: Install Chrome
run: |
sudo apt-get update
sudo apt-get install -y chromium-browser
- name: Run tests
run: npm test
env:
CI: true
```
**Docker 配置:**
```dockerfile
FROM node:16-alpine
# 安装 Chrome
RUN apk add --no-cache chromium
# 安装依赖
COPY package*.json ./
RUN npm ci
# 复制测试文件
COPY . .
# 运行测试
CMD ["npm", "test"]
```
**10. 最佳实践总结**
**1. 测试隔离:**
```javascript
// 每个测试使用独立的上下文
beforeEach(async () => {
const context = await browser.createIncognitoBrowserContext();
page = await context.newPage();
});
afterEach(async () => {
await context.close();
});
```
**2. 等待策略:**
```javascript
// 使用明确的等待
await page.waitForSelector('.element', { visible: true });
// 避免硬编码延迟
// await page.waitForTimeout(5000); // 不推荐
```
**3. 错误处理:**
```javascript
test('with error handling', async () => {
try {
await page.click('#button');
} catch (error) {
// 保存失败截图
await page.screenshot({ path: 'failure.png' });
throw error;
}
});
```
**4. 测试数据清理:**
```javascript
afterEach(async () => {
// 清理测试数据
await cleanupTestData();
});
```
**5. 并行测试:**
```javascript
// Jest 配置
module.exports = {
maxWorkers: 4, // 并行运行测试
preset: 'jest-puppeteer'
};
```
前端 · 2月19日 19:55
Puppeteer 如何处理动态网页和单页应用(SPA)?有哪些处理异步加载和路由变化的技巧?Puppeteer 在处理动态网页和单页应用(SPA)时具有独特的优势,可以执行 JavaScript、等待异步加载、处理路由变化等。
**1. 处理动态内容加载**
**等待元素出现:**
```javascript
const puppeteer = require('puppeteer');
async function scrapeDynamicContent() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// 等待动态加载的元素
await page.waitForSelector('.dynamic-content', { visible: true });
const content = await page.$eval('.dynamic-content', el => el.textContent);
console.log(content);
await browser.close();
}
scrapeDynamicContent();
```
**等待特定条件:**
```javascript
await page.waitForFunction(() => {
return document.querySelectorAll('.item').length > 0;
});
```
**等待网络请求完成:**
```javascript
await page.goto('https://example.com', {
waitUntil: 'networkidle2'
});
```
**2. 处理无限滚动**
**基本无限滚动:**
```javascript
async function scrapeInfiniteScroll() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/infinite-scroll');
const items = [];
let previousHeight = 0;
while (true) {
// 滚动到底部
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
// 等待新内容加载
await page.waitForTimeout(1000);
// 检查是否有新内容
const currentHeight = await page.evaluate(() => document.body.scrollHeight);
if (currentHeight === previousHeight) {
break; // 没有新内容了
}
previousHeight = currentHeight;
// 收集数据
const newItems = await page.$$eval('.item', elements => {
return elements.map(el => el.textContent);
});
items.push(...newItems);
}
await browser.close();
return items;
}
```
**优化的无限滚动:**
```javascript
async function scrapeInfiniteScrollOptimized() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/infinite-scroll');
const items = [];
let noNewItemsCount = 0;
while (noNewItemsCount < 3) { // 连续 3 次没有新内容就停止
const itemCountBefore = items.length;
// 滚动到底部
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});
// 等待加载指示器消失
try {
await page.waitForSelector('.loading', { hidden: true, timeout: 3000 });
} catch (error) {
// 加载指示器可能不存在
}
// 收集新数据
const newItems = await page.$$eval('.item', elements => {
return elements.map(el => el.textContent);
});
if (newItems.length === itemCountBefore) {
noNewItemsCount++;
} else {
noNewItemsCount = 0;
items.push(...newItems);
}
}
await browser.close();
return items;
}
```
**3. 处理 SPA 路由**
**监听路由变化:**
```javascript
async function handleSPARoutes() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// 监听路由变化
page.on('framenavigated', async (frame) => {
console.log('Navigated to:', frame.url());
// 等待页面内容加载
await frame.waitForSelector('.content');
const title = await frame.$eval('.content', el => el.textContent);
console.log('Page title:', title);
});
// 点击导航链接
await page.click('#about-link');
await page.waitForTimeout(1000);
await page.click('#contact-link');
await page.waitForTimeout(1000);
await browser.close();
}
```
**等待特定路由:**
```javascript
async function waitForRoute(page, path) {
return new Promise((resolve) => {
const checkRoute = async () => {
const currentPath = await page.evaluate(() => window.location.pathname);
if (currentPath === path) {
resolve();
} else {
setTimeout(checkRoute, 100);
}
};
checkRoute();
});
}
// 使用
await page.click('#about-link');
await waitForRoute(page, '/about');
```
**4. 处理 AJAX 请求**
**等待特定 API 响应:**
```javascript
async function waitForAPIResponse(page, urlPattern) {
return new Promise((resolve) => {
page.on('response', (response) => {
if (response.url().includes(urlPattern)) {
resolve(response);
}
});
});
}
// 使用
const apiResponse = await Promise.all([
waitForAPIResponse(page, '/api/data'),
page.click('#load-data-button')
]);
const data = await apiResponse.json();
console.log(data);
```
**拦截和修改 API 请求:**
```javascript
await page.setRequestInterception(true);
page.on('request', (request) => {
if (request.url().includes('/api/data')) {
// 修改请求
request.continue({
headers: {
...request.headers(),
'Authorization': 'Bearer token'
}
});
} else {
request.continue();
}
});
```
**5. 处理 WebSocket**
**监听 WebSocket 消息:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Network.enable');
client.on('Network.webSocketFrameReceived', (params) => {
console.log('WebSocket message:', params.response.payloadData);
});
client.on('Network.webSocketFrameSent', (params) => {
console.log('WebSocket sent:', params.response.payloadData);
});
```
**6. 处理客户端渲染**
**等待客户端渲染完成:**
```javascript
async function waitForClientRendering(page) {
// 方法 1:等待特定元素
await page.waitForSelector('.rendered-content');
// 方法 2:等待渲染标志
await page.waitForFunction(() => {
return window.__RENDER_COMPLETE__ === true;
});
// 方法 3:等待网络空闲
await page.waitForFunction(() => {
return performance.getEntriesByType('resource').length > 0;
});
}
```
**处理 React/Vue 应用:**
```javascript
async function scrapeReactApp() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/react-app');
// 等待 React 应用挂载
await page.waitForSelector('#root');
// 等待数据加载完成
await page.waitForFunction(() => {
return window.__INITIAL_STATE__?.loaded === true;
});
// 与 React 应用交互
await page.click('#load-more-button');
await page.waitForSelector('.new-items');
const items = await page.$$eval('.item', elements => {
return elements.map(el => el.textContent);
});
await browser.close();
return items;
}
```
**7. 实际应用场景**
**场景 1:抓取社交媒体动态内容**
```javascript
async function scrapeSocialMediaPosts(username) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`https://social-media.com/${username}`);
const posts = [];
// 滚动加载更多帖子
while (posts.length < 50) {
// 滚动到底部
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
// 等待新帖子加载
await page.waitForTimeout(2000);
// 收集帖子数据
const newPosts = await page.$$eval('.post', elements => {
return elements.map(post => ({
id: post.dataset.id,
content: post.querySelector('.content')?.textContent,
likes: post.querySelector('.likes')?.textContent,
timestamp: post.querySelector('.timestamp')?.textContent
}));
});
// 只添加新帖子
const newPostIds = new Set(posts.map(p => p.id));
const uniqueNewPosts = newPosts.filter(p => !newPostIds.has(p.id));
posts.push(...uniqueNewPosts);
}
await browser.close();
return posts;
}
```
**场景 2:抓取电商网站商品列表**
```javascript
async function scrapeEcommerceProducts(categoryUrl) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(categoryUrl);
const products = [];
while (true) {
// 等待商品加载
await page.waitForSelector('.product-card');
// 收集当前页商品
const pageProducts = await page.$$eval('.product-card', cards => {
return cards.map(card => ({
id: card.dataset.id,
title: card.querySelector('.title')?.textContent,
price: card.querySelector('.price')?.textContent,
rating: card.querySelector('.rating')?.textContent
}));
});
products.push(...pageProducts);
// 检查是否有下一页
const nextButton = await page.$('.next-page:not(.disabled)');
if (!nextButton) {
break;
}
// 点击下一页
await nextButton.click();
await page.waitForTimeout(1000);
}
await browser.close();
return products;
}
```
**场景 3:抓取实时数据更新**
```javascript
async function scrapeRealTimeData(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const dataUpdates = [];
// 监听 DOM 变化
await page.evaluate(() => {
const observer = new MutationObserver((mutations) => {
mutations.forEach((mutation) => {
if (mutation.type === 'childList') {
window.__DATA_UPDATES__ = window.__DATA_UPDATES__ || [];
window.__DATA_UPDATES__.push({
timestamp: Date.now(),
addedNodes: mutation.addedNodes.length
});
}
});
});
observer.observe(document.body, {
childList: true,
subtree: true
});
});
// 等待一段时间收集数据
await page.waitForTimeout(30000);
// 获取收集的数据
const updates = await page.evaluate(() => {
return window.__DATA_UPDATES__ || [];
});
await browser.close();
return updates;
}
```
**8. 最佳实践**
**1. 使用适当的等待策略:**
```javascript
// 优先使用 waitForSelector
await page.waitForSelector('.element');
// 复杂条件使用 waitForFunction
await page.waitForFunction(() => {
return document.querySelectorAll('.item').length > 10;
});
// 网络请求使用 waitForResponse
await page.waitForResponse(response =>
response.url().includes('/api/data')
);
```
**2. 避免硬编码等待时间:**
```javascript
// 不好的做法
await page.waitForTimeout(5000);
// 好的做法
await page.waitForSelector('.loaded-content');
```
**3. 处理加载失败:**
```javascript
try {
await page.waitForSelector('.content', { timeout: 10000 });
} catch (error) {
console.log('Content failed to load, using fallback');
// 使用备用策略
}
```
**4. 优化性能:**
```javascript
// 禁用不必要的资源
await page.setRequestInterception(true);
page.on('request', (request) => {
if (['image', 'font', 'media'].includes(request.resourceType())) {
request.abort();
} else {
request.continue();
}
});
```
**5. 处理反爬虫:**
```javascript
// 设置真实的用户代理
await page.setUserAgent('Mozilla/5.0 ...');
// 添加随机延迟
const randomDelay = () => Math.random() * 2000 + 1000;
await page.waitForTimeout(randomDelay());
// 模拟人类行为
await page.evaluate(() => {
window.scrollBy(0, Math.random() * 500);
});
```
前端 · 2月19日 19:49
Puppeteer 在实际项目中有哪些应用场景?请举例说明网页爬虫、自动化测试等具体实现。Puppeteer 在实际项目中有广泛的应用场景,从网页爬虫到自动化测试,从数据采集到性能监控。以下是一些典型的实际应用案例。
**1. 网页爬虫和数据采集**
**案例 1:电商商品价格监控**
```javascript
const puppeteer = require('puppeteer');
async function monitorProductPrices(productUrls) {
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const results = [];
for (const url of productUrls) {
const page = await browser.newPage();
// 设置用户代理,避免被识别为爬虫
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
await page.goto(url, { waitUntil: 'networkidle2' });
// 等待价格元素加载
await page.waitForSelector('.price', { timeout: 5000 });
const productData = await page.evaluate(() => {
return {
title: document.querySelector('.product-title')?.textContent,
price: document.querySelector('.price')?.textContent,
availability: document.querySelector('.availability')?.textContent,
rating: document.querySelector('.rating')?.textContent
};
});
results.push({
url,
...productData,
timestamp: new Date().toISOString()
});
await page.close();
}
await browser.close();
return results;
}
// 使用示例
const products = [
'https://example.com/product/1',
'https://example.com/product/2'
];
monitorProductPrices(products).then(data => {
console.log(JSON.stringify(data, null, 2));
});
```
**案例 2:社交媒体数据抓取**
```javascript
async function scrapeSocialMedia(username) {
const browser = await puppeteer.launch({
headless: 'new'
});
const page = await browser.newPage();
// 模拟登录
await page.goto('https://social-media.com/login');
await page.type('#username', 'your_username');
await page.type('#password', 'your_password');
await page.click('#login-button');
await page.waitForNavigation();
// 访问用户页面
await page.goto(`https://social-media.com/${username}`);
// 滚动加载更多内容
while (true) {
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
try {
await page.waitForSelector('.new-post', { timeout: 2000 });
} catch {
break;
}
}
// 抓取帖子数据
const posts = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.post')).map(post => ({
content: post.querySelector('.content')?.textContent,
likes: post.querySelector('.likes')?.textContent,
comments: post.querySelector('.comments')?.textContent,
date: post.querySelector('.date')?.textContent
}));
});
await browser.close();
return posts;
}
```
**2. 自动化测试**
**案例 3:E2E 测试**
```javascript
const { expect } = require('expect-puppeteer');
async function runE2ETest() {
const browser = await puppeteer.launch({
headless: 'new',
slowMo: 50 // 减慢操作速度,便于观察
});
const page = await browser.newPage();
try {
// 测试用户注册流程
await page.goto('https://example.com/register');
// 填写注册表单
await page.type('#username', 'testuser');
await page.type('#email', 'test@example.com');
await page.type('#password', 'password123');
await page.type('#confirm-password', 'password123');
// 提交表单
await Promise.all([
page.waitForNavigation(),
page.click('#register-button')
]);
// 验证注册成功
await expect(page).toMatch('Welcome, testuser!');
// 测试登录流程
await page.click('#logout-button');
await page.waitForNavigation();
await page.type('#login-email', 'test@example.com');
await page.type('#login-password', 'password123');
await page.click('#login-button');
await page.waitForNavigation();
// 验证登录成功
await expect(page).toMatch('Welcome back!');
console.log('E2E test passed!');
} catch (error) {
console.error('E2E test failed:', error);
// 保存失败截图
await page.screenshot({ path: 'test-failure.png' });
} finally {
await browser.close();
}
}
runE2ETest();
```
**案例 4:视觉回归测试**
```javascript
const fs = require('fs');
const pixelmatch = require('pixelmatch');
const { PNG } = require('pngjs');
async function visualRegressionTest(url, baselinePath) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
// 截取当前页面
const screenshot = await page.screenshot();
await browser.close();
// 如果没有基线图片,保存当前截图作为基线
if (!fs.existsSync(baselinePath)) {
fs.writeFileSync(baselinePath, screenshot);
console.log('Baseline image created');
return true;
}
// 读取基线图片
const baseline = PNG.sync.read(fs.readFileSync(baselinePath));
const current = PNG.sync.read(screenshot);
// 比较图片差异
const diff = new PNG({ width: baseline.width, height: baseline.height });
const numDiffPixels = pixelmatch(
baseline.data,
current.data,
diff.data,
baseline.width,
baseline.height,
{ threshold: 0.1 }
);
// 保存差异图片
fs.writeFileSync('diff.png', PNG.sync.write(diff));
const totalPixels = baseline.width * baseline.height;
const diffPercentage = (numDiffPixels / totalPixels) * 100;
console.log(`Difference: ${diffPercentage.toFixed(2)}%`);
// 如果差异超过阈值,测试失败
if (diffPercentage > 0.5) {
console.log('Visual regression detected!');
return false;
}
console.log('Visual regression test passed!');
return true;
}
visualRegressionTest('https://example.com', 'baseline.png');
```
**3. PDF 生成和文档处理**
**案例 5:动态报表生成**
```javascript
async function generateReport(data, outputPath) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 生成 HTML 报表
const html = `
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; padding: 40px; }
h1 { color: #333; }
table { width: 100%; border-collapse: collapse; margin-top: 20px; }
th, td { border: 1px solid #ddd; padding: 12px; text-align: left; }
th { background-color: #f2f2f2; }
.summary { margin-top: 30px; padding: 20px; background-color: #f9f9f9; }
</style>
</head>
<body>
<h1>销售报表</h1>
<p>生成时间: ${new Date().toLocaleString()}</p>
<table>
<thead>
<tr>
<th>产品</th>
<th>数量</th>
<th>单价</th>
<th>总价</th>
</tr>
</thead>
<tbody>
${data.map(item => `
<tr>
<td>${item.product}</td>
<td>${item.quantity}</td>
<td>$${item.price.toFixed(2)}</td>
<td>$${(item.quantity * item.price).toFixed(2)}</td>
</tr>
`).join('')}
</tbody>
</table>
<div class="summary">
<h2>总计: $${data.reduce((sum, item) => sum + item.quantity * item.price, 0).toFixed(2)}</h2>
</div>
</body>
</html>
`;
await page.setContent(html);
// 生成 PDF
await page.pdf({
path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: '20px',
right: '20px',
bottom: '20px',
left: '20px'
}
});
await browser.close();
console.log(`Report generated: ${outputPath}`);
}
// 使用示例
const salesData = [
{ product: '产品 A', quantity: 10, price: 99.99 },
{ product: '产品 B', quantity: 5, price: 149.99 },
{ product: '产品 C', quantity: 8, price: 79.99 }
];
generateReport(salesData, 'sales-report.pdf');
```
**案例 6:发票批量生成**
```javascript
async function generateInvoices(invoices) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const invoice of invoices) {
const html = `
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; padding: 40px; }
.header { text-align: center; margin-bottom: 40px; }
.invoice-info { margin-bottom: 30px; }
table { width: 100%; border-collapse: collapse; }
th, td { border: 1px solid #ddd; padding: 10px; text-align: left; }
th { background-color: #f2f2f2; }
.total { text-align: right; font-weight: bold; margin-top: 20px; }
</style>
</head>
<body>
<div class="header">
<h1>发票</h1>
<p>发票号: ${invoice.number}</p>
</div>
<div class="invoice-info">
<p>日期: ${invoice.date}</p>
<p>客户: ${invoice.customer}</p>
</div>
<table>
<thead>
<tr>
<th>项目</th>
<th>数量</th>
<th>单价</th>
<th>总价</th>
</tr>
</thead>
<tbody>
${invoice.items.map(item => `
<tr>
<td>${item.name}</td>
<td>${item.quantity}</td>
<td>$${item.price}</td>
<td>$${item.quantity * item.price}</td>
</tr>
`).join('')}
</tbody>
</table>
<div class="total">
总计: $${invoice.total}
</div>
</body>
</html>
`;
await page.setContent(html);
await page.pdf({
path: `invoices/invoice-${invoice.number}.pdf`,
format: 'A4',
printBackground: true
});
console.log(`Generated invoice: ${invoice.number}`);
}
await browser.close();
}
// 使用示例
const invoices = [
{
number: 'INV-001',
date: '2024-01-15',
customer: '客户 A',
items: [
{ name: '服务 A', quantity: 1, price: 500 },
{ name: '服务 B', quantity: 2, price: 300 }
],
total: 1100
}
];
generateInvoices(invoices);
```
**4. 性能监控和分析**
**案例 7:页面性能分析**
```javascript
async function analyzePagePerformance(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 启用性能监控
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
await client.send('Network.enable');
// 记录开始时间
const startTime = Date.now();
await page.goto(url, { waitUntil: 'networkidle2' });
const loadTime = Date.now() - startTime;
// 获取性能指标
const metrics = await client.send('Performance.getMetrics');
// 获取关键性能指标
const performanceData = {
loadTime,
domContentLoaded: await page.evaluate(() =>
performance.timing.domContentLoadedEventEnd - performance.timing.navigationStart
),
firstPaint: await page.evaluate(() =>
performance.getEntriesByType('paint')[0]?.startTime
),
firstContentfulPaint: await page.evaluate(() =>
performance.getEntriesByType('paint')[1]?.startTime
),
resources: metrics.metrics
};
// 生成性能报告
console.log('Performance Report:');
console.log(`Load Time: ${performanceData.loadTime}ms`);
console.log(`DOM Content Loaded: ${performanceData.domContentLoaded}ms`);
console.log(`First Paint: ${performanceData.firstPaint}ms`);
console.log(`First Contentful Paint: ${performanceData.firstContentfulPaint}ms`);
await browser.close();
return performanceData;
}
analyzePagePerformance('https://example.com');
```
**5. SEO 工具**
**案例 8:SEO 检查工具**
```javascript
async function seoAudit(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
const seoData = await page.evaluate(() => {
const issues = [];
const warnings = [];
// 检查标题
const title = document.querySelector('title');
if (!title) {
issues.push('Missing title tag');
} else if (title.textContent.length > 60) {
warnings.push('Title too long (> 60 characters)');
}
// 检查描述
const description = document.querySelector('meta[name="description"]');
if (!description) {
issues.push('Missing meta description');
} else if (description.content.length > 160) {
warnings.push('Meta description too long (> 160 characters)');
}
// 检查 H1 标签
const h1Tags = document.querySelectorAll('h1');
if (h1Tags.length === 0) {
issues.push('Missing H1 tag');
} else if (h1Tags.length > 1) {
warnings.push('Multiple H1 tags found');
}
// 检查图片 alt 属性
const images = document.querySelectorAll('img');
let missingAlt = 0;
images.forEach(img => {
if (!img.alt) missingAlt++;
});
if (missingAlt > 0) {
warnings.push(`${missingAlt} images missing alt attributes`);
}
// 检查链接
const links = document.querySelectorAll('a[href]');
let brokenLinks = 0;
links.forEach(link => {
if (link.getAttribute('href').startsWith('#')) brokenLinks++;
});
return {
title: title?.textContent,
description: description?.content,
h1Count: h1Tags.length,
imageCount: images.length,
linkCount: links.length,
issues,
warnings
};
});
console.log('SEO Audit Results:');
console.log(JSON.stringify(seoData, null, 2));
await browser.close();
return seoData;
}
seoAudit('https://example.com');
```
**6. 最佳实践总结**
**1. 错误处理:**
```javascript
try {
// 操作代码
} catch (error) {
console.error('Error:', error);
// 保存错误截图
await page.screenshot({ path: 'error.png' });
} finally {
await browser.close();
}
```
**2. 资源管理:**
```javascript
// 及时清理资源
await page.close();
await browser.close();
```
**3. 性能优化:**
```javascript
// 禁用不必要的资源
await page.setRequestInterception(true);
page.on('request', (request) => {
if (['image', 'font'].includes(request.resourceType())) {
request.abort();
} else {
request.continue();
}
});
```
**4. 反爬虫策略:**
```javascript
// 设置真实的用户代理
await page.setUserAgent('Mozilla/5.0 ...');
// 添加延迟
await new Promise(resolve => setTimeout(resolve, 1000));
// 使用代理
const browser = await puppeteer.launch({
args: ['--proxy-server=http://proxy.example.com:8080']
});
```
前端 · 2月19日 19:48
Puppeteer 如何实现页面交互和表单操作?有哪些常用的 API 和最佳实践?Puppeteer 提供了丰富的页面交互和表单操作功能,可以模拟用户的真实操作行为,这对于自动化测试和网页爬虫非常重要。
**1. 基本页面操作**
**导航到页面:**
```javascript
// 基本导航
await page.goto('https://example.com');
// 等待网络空闲
await page.goto('https://example.com', {
waitUntil: 'networkidle2'
});
// 设置超时时间
await page.goto('https://example.com', {
timeout: 30000
});
// 等待特定条件
await page.goto('https://example.com', {
waitUntil: ['load', 'domcontentloaded']
});
```
**刷新页面:**
```javascript
await page.reload();
await page.reload({ waitUntil: 'networkidle2' });
```
**前进和后退:**
```javascript
await page.goBack();
await page.goForward();
```
**2. 元素选择**
Puppeteer 支持多种选择器方式。
**使用 $ 选择单个元素:**
```javascript
// 通过 CSS 选择器
const element = await page.$('#my-id');
const element = await page.$('.my-class');
const element = await page.$('div > p');
// 通过 XPath
const element = await page.$x('//div[@class="my-class"]');
```
**使用 $$ 选择多个元素:**
```javascript
// 选择所有匹配的元素
const elements = await page.$$('.item');
console.log(elements.length); // 元素数量
// 遍历元素
for (const element of elements) {
const text = await element.evaluate(el => el.textContent);
console.log(text);
}
```
**使用 $$eval 批量获取数据:**
```javascript
// 获取所有元素的文本
const texts = await page.$$eval('.item', elements => {
return elements.map(el => el.textContent);
});
// 获取所有元素的属性
const hrefs = await page.$$eval('a', elements => {
return elements.map(el => el.href);
});
```
**3. 点击操作**
**基本点击:**
```javascript
await page.click('#button');
await page.click('.submit-btn');
```
**带选项的点击:**
```javascript
await page.click('#button', {
button: 'left', // 'left', 'right', 'middle'
clickCount: 1, // 点击次数
delay: 100, // 点击延迟(毫秒)
offset: { // 点击位置偏移
x: 10,
y: 10
}
});
```
**双击:**
```javascript
await page.click('#button', { clickCount: 2 });
```
**右键点击:**
```javascript
await page.click('#button', { button: 'right' });
```
**等待元素可点击:**
```javascript
await page.waitForSelector('#button', { visible: true });
await page.click('#button');
```
**4. 文本输入**
**基本输入:**
```javascript
await page.type('#input', 'Hello World');
```
**带选项的输入:**
```javascript
await page.type('#input', 'Hello World', {
delay: 100, // 每个字符的延迟(毫秒)
clear: true // 输入前清空输入框
});
```
**模拟真实打字速度:**
```javascript
await page.type('#input', 'Hello World', { delay: 50 });
```
**清空输入框:**
```javascript
await page.click('#input');
await page.keyboard.down('Control');
await page.keyboard.press('A');
await page.keyboard.up('Control');
await page.keyboard.press('Backspace');
```
**5. 键盘操作**
**基本按键:**
```javascript
await page.keyboard.press('Enter');
await page.keyboard.press('Tab');
await page.keyboard.press('Escape');
await page.keyboard.press('Backspace');
```
**组合键:**
```javascript
// Ctrl+C
await page.keyboard.down('Control');
await page.keyboard.press('C');
await page.keyboard.up('Control');
// Ctrl+A (全选)
await page.keyboard.down('Control');
await page.keyboard.press('A');
await page.keyboard.up('Control');
// Ctrl+V (粘贴)
await page.keyboard.down('Control');
await page.keyboard.press('V');
await page.keyboard.up('Control');
```
**特殊键:**
```javascript
await page.keyboard.press('ArrowUp');
await page.keyboard.press('ArrowDown');
await page.keyboard.press('ArrowLeft');
await page.keyboard.press('ArrowRight');
await page.keyboard.press('PageUp');
await page.keyboard.press('PageDown');
await page.keyboard.press('Home');
await page.keyboard.press('End');
```
**6. 鼠标操作**
**移动鼠标:**
```javascript
await page.mouse.move(100, 100);
await page.mouse.move(100, 100, { steps: 10 }); // 平滑移动
```
**点击鼠标:**
```javascript
await page.mouse.click(100, 100);
await page.mouse.click(100, 100, {
button: 'left',
clickCount: 1
});
```
**按下和释放鼠标:**
```javascript
await page.mouse.down();
await page.mouse.up();
// 拖拽操作
await page.mouse.down({ x: 100, y: 100 });
await page.mouse.move(200, 200, { steps: 10 });
await page.mouse.up();
```
**7. 表单操作**
**填写表单:**
```javascript
// 文本输入
await page.type('#name', 'John Doe');
await page.type('#email', 'john@example.com');
// 选择下拉框
await page.selectOption('#country', 'CN');
await page.selectOption('#country', ['CN', 'US']); // 多选
// 复选框
await page.click('#checkbox');
const isChecked = await page.$eval('#checkbox', el => el.checked);
// 单选框
await page.click('#radio-male');
// 文件上传
await page.setInputFiles('#file-upload', '/path/to/file.pdf');
await page.setInputFiles('#file-upload', ['/file1.pdf', '/file2.pdf']);
```
**提交表单:**
```javascript
// 点击提交按钮
await page.click('#submit-button');
// 使用表单提交
await page.evaluate(() => {
document.querySelector('form').submit();
});
```
**8. 滚动操作**
**滚动到页面底部:**
```javascript
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});
```
**滚动到特定元素:**
```javascript
await page.evaluate(() => {
document.querySelector('#target').scrollIntoView();
});
```
**平滑滚动:**
```javascript
await page.evaluate(() => {
window.scrollTo({
top: 1000,
behavior: 'smooth'
});
});
```
**滚动指定距离:**
```javascript
await page.evaluate(() => {
window.scrollBy(0, 500);
});
```
**9. 等待元素**
**等待元素出现:**
```javascript
await page.waitForSelector('.result');
await page.waitForSelector('.result', { visible: true });
await page.waitForSelector('.result', { hidden: true });
```
**等待 XPath:**
```javascript
await page.waitForXPath('//div[@class="result"]');
```
**等待函数:**
```javascript
await page.waitForFunction(() => {
return document.querySelectorAll('.item').length > 5;
});
```
**等待导航:**
```javascript
await Promise.all([
page.waitForNavigation(),
page.click('#link')
]);
```
**10. 获取元素信息**
**获取文本内容:**
```javascript
const text = await page.$eval('.title', el => el.textContent);
```
**获取属性:**
```javascript
const href = await page.$eval('a', el => el.href);
const id = await page.$eval('div', el => el.id);
```
**获取多个元素信息:**
```javascript
const texts = await page.$$eval('.item', elements => {
return elements.map(el => el.textContent);
});
```
**检查元素是否存在:**
```javascript
const exists = await page.$('.element') !== null;
```
**检查元素是否可见:**
```javascript
const isVisible = await page.$eval('.element', el => {
return el.offsetParent !== null;
});
```
**11. 实际应用场景**
**场景 1:登录表单填写**
```javascript
async function login(url, username, password) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
// 填写登录表单
await page.type('#username', username);
await page.type('#password', password);
// 点击登录按钮
await Promise.all([
page.waitForNavigation(),
page.click('#login-button')
]);
// 验证登录成功
const isLoggedIn = await page.$('.user-profile') !== null;
await browser.close();
return isLoggedIn;
}
login('https://example.com/login', 'user@example.com', 'password');
```
**场景 2:搜索功能测试**
```javascript
async function testSearch(url, query) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
// 输入搜索关键词
await page.type('#search-input', query);
// 提交搜索
await Promise.all([
page.waitForNavigation(),
page.keyboard.press('Enter')
]);
// 等待搜索结果
await page.waitForSelector('.search-result');
// 获取结果数量
const resultCount = await page.$$eval('.search-result', results => {
return results.length;
});
console.log(`Found ${resultCount} results for "${query}"`);
await browser.close();
return resultCount;
}
testSearch('https://example.com', 'puppeteer');
```
**场景 3:分页数据抓取**
```javascript
async function scrapePaginatedData(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const allData = [];
let hasNextPage = true;
let pageNum = 1;
while (hasNextPage) {
await page.goto(`${url}?page=${pageNum}`);
await page.waitForSelector('.item');
// 抓取当前页数据
const pageData = await page.$$eval('.item', items => {
return items.map(item => ({
title: item.querySelector('.title').textContent,
price: item.querySelector('.price').textContent
}));
});
allData.push(...pageData);
console.log(`Scraped page ${pageNum}: ${pageData.length} items`);
// 检查是否有下一页
hasNextPage = await page.$('.next-page:not(.disabled)') !== null;
pageNum++;
}
await browser.close();
return allData;
}
scrapePaginatedData('https://example.com/products');
```
**场景 4:动态内容加载**
```javascript
async function scrapeDynamicContent(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
// 等待初始内容加载
await page.waitForSelector('.content');
// 滚动加载更多内容
while (true) {
// 滚动到底部
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});
// 等待新内容加载
try {
await page.waitForSelector('.new-content', { timeout: 3000 });
} catch (error) {
break; // 没有新内容了
}
}
// 获取所有内容
const allContent = await page.$$eval('.content-item', items => {
return items.map(item => item.textContent);
});
await browser.close();
return allContent;
}
scrapeDynamicContent('https://example.com/infinite-scroll');
```
**12. 最佳实践**
**1. 使用等待机制:**
```javascript
// 好的做法
await page.waitForSelector('.button', { visible: true });
await page.click('.button');
// 不好的做法
await page.click('.button'); // 可能失败
```
**2. 处理动态内容:**
```javascript
// 等待网络空闲
await page.goto(url, { waitUntil: 'networkidle2' });
// 等待特定元素
await page.waitForSelector('.loaded-content');
```
**3. 错误处理:**
```javascript
try {
await page.click('#button');
} catch (error) {
console.error('Click failed:', error);
// 重试逻辑
}
```
**4. 性能优化:**
```javascript
// 禁用不必要的资源
await page.setRequestInterception(true);
page.on('request', (request) => {
if (['image', 'font', 'media'].includes(request.resourceType())) {
request.abort();
} else {
request.continue();
}
});
```
**5. 清理资源:**
```javascript
try {
// 操作代码
} finally {
await browser.close();
}
```
前端 · 2月19日 19:48
Puppeteer 和 Selenium 有什么区别?在什么场景下应该选择 Puppeteer 而不是 Selenium?Puppeteer 和 Selenium 都是流行的浏览器自动化工具,但它们在设计理念、实现方式和使用场景上有显著差异。
**1. 架构差异**
**Puppeteer:**
- 基于 Chrome DevTools Protocol (CDP)
- 直接与浏览器通信,无需中间层
- 专为 Chrome/Chromium 设计
- 使用 WebSocket 与浏览器建立连接
**Selenium:**
- 基于 WebDriver 协议
- 通过 WebDriver 服务器与浏览器通信
- 支持多种浏览器(Chrome、Firefox、Safari、Edge 等)
- 需要安装浏览器驱动程序
**2. 性能对比**
**Puppeteer:**
```javascript
// 启动速度快
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com'); // 快速加载
```
**Selenium:**
```javascript
// 启动较慢
const driver = await new Builder()
.forBrowser('chrome')
.build();
await driver.get('https://example.com'); // 加载较慢
```
**性能指标对比:**
| 指标 | Puppeteer | Selenium |
|------|-----------|----------|
| 启动时间 | 快(1-2秒) | 慢(3-5秒) |
| 执行速度 | 快 | 中等 |
| 内存占用 | 较低 | 较高 |
| 网络请求 | 直接通信 | 通过驱动 |
**3. API 设计**
**Puppeteer API:**
```javascript
// 简洁直观的 API
await page.click('#button');
await page.type('#input', 'text');
await page.waitForSelector('.result');
const text = await page.$eval('.title', el => el.textContent);
```
**Selenium API:**
```javascript
// 相对复杂的 API
await driver.findElement(By.id('button')).click();
await driver.findElement(By.id('input')).sendKeys('text');
await driver.wait(until.elementLocated(By.css('.result')));
const text = await driver.findElement(By.css('.title')).getText();
```
**4. 浏览器支持**
**Puppeteer:**
- Chrome/Chromium(主要支持)
- Firefox(实验性支持,通过 puppeteer-firefox)
- 其他浏览器支持有限
**Selenium:**
- Chrome
- Firefox
- Safari
- Edge
- Opera
- Internet Explorer
- 支持几乎所有主流浏览器
**5. 功能特性对比**
**Puppeteer 特有功能:**
```javascript
// 1. 网络拦截
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'image') {
request.abort();
} else {
request.continue();
}
});
// 2. 性能追踪
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
const metrics = await client.send('Performance.getMetrics');
// 3. 文件下载
const [download] = await Promise.all([
page.waitForEvent('download'),
page.click('#download-button')
]);
await download.saveAs('/path/to/save');
// 4. 设备模拟
const devices = puppeteer.devices;
const iPhone = devices['iPhone 12'];
await page.emulate(iPhone);
// 5. 地理位置模拟
await page.setGeolocation({ latitude: 35.6895, longitude: 139.6917 });
```
**Selenium 特有功能:**
```javascript
// 1. 多浏览器支持
const driver = await new Builder()
.forBrowser('firefox')
.build();
// 2. 分布式测试(Selenium Grid)
// 可以在多台机器上并行运行测试
// 3. 移动设备测试(Appium)
// 支持原生移动应用测试
// 4. 高级等待机制
await driver.wait(
until.titleIs('Expected Title'),
5000,
'Title did not match'
);
// 5. Actions API(复杂交互)
await driver.actions()
.move({ origin: element })
.press()
.move({ origin: targetElement })
.release()
.perform();
```
**6. 使用场景**
**Puppeteer 适用场景:**
- 网页爬虫和数据抓取
- 生成截图和 PDF
- 性能测试和监控
- CI/CD 自动化测试
- SPA(单页应用)测试
- 需要网络拦截的场景
**Selenium 适用场景:**
- 跨浏览器兼容性测试
- 大型企业级测试框架
- 分布式测试环境
- 需要支持多种浏览器的项目
- 移动应用测试(配合 Appium)
- 传统 Web 应用测试
**7. 学习曲线**
**Puppeteer:**
- API 简洁直观
- 文档清晰易懂
- 学习曲线较平缓
- 适合初学者
**Selenium:**
- API 相对复杂
- 需要理解 WebDriver 概念
- 学习曲线较陡峭
- 需要更多配置
**8. 社区和生态系统**
**Puppeteer:**
- Google 官方维护
- 活跃的 GitHub 社区
- 丰富的插件生态
- 持续更新和改进
**Selenium:**
- 开源社区维护
- 成熟的生态系统
- 大量第三方工具和集成
- 广泛的企业应用
**9. 实际代码对比**
**任务:登录并获取用户信息**
**Puppeteer 实现:**
```javascript
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/login');
// 填写表单
await page.type('#username', 'user@example.com');
await page.type('#password', 'password123');
// 提交表单并等待导航
await Promise.all([
page.waitForNavigation(),
page.click('#login-button')
]);
// 获取用户信息
const userInfo = await page.evaluate(() => {
return {
name: document.querySelector('.user-name').textContent,
email: document.querySelector('.user-email').textContent
};
});
console.log(userInfo);
await browser.close();
})();
```
**Selenium 实现:**
```javascript
const { Builder, By, until } = require('selenium-webdriver');
(async () => {
const driver = await new Builder()
.forBrowser('chrome')
.build();
await driver.get('https://example.com/login');
// 填写表单
await driver.findElement(By.id('username')).sendKeys('user@example.com');
await driver.findElement(By.id('password')).sendKeys('password123');
// 提交表单并等待导航
await Promise.all([
driver.wait(until.titleContains('Dashboard'), 5000),
driver.findElement(By.id('login-button')).click()
]);
// 获取用户信息
const userInfo = {
name: await driver.findElement(By.css('.user-name')).getText(),
email: await driver.findElement(By.css('.user-email')).getText()
};
console.log(userInfo);
await driver.quit();
})();
```
**10. 选择建议**
**选择 Puppeteer 如果:**
- 主要使用 Chrome/Chromium
- 需要高性能和快速执行
- 需要网络拦截或性能分析
- 项目规模较小或中等
- 团队熟悉 Node.js
- 需要生成截图或 PDF
**选择 Selenium 如果:**
- 需要支持多种浏览器
- 需要跨浏览器兼容性测试
- 项目规模较大或企业级
- 需要分布式测试环境
- 需要测试移动应用
- 团队已有 Selenium 经验
**11. 混合使用策略**
在某些项目中,可以结合两者的优势:
```javascript
// 使用 Puppeteer 进行快速开发和测试
const puppeteer = require('puppeteer');
// 使用 Selenium 进行跨浏览器验证
const { Builder, By } = require('selenium-webdriver');
async function testWithPuppeteer() {
// 快速测试主要功能
}
async function testWithSelenium() {
// 跨浏览器兼容性测试
}
```
**总结:**
Puppeteer 和 Selenium 各有优势,选择哪个工具取决于项目需求、团队技能和测试场景。Puppeteer 更适合现代 Web 应用和快速开发,而 Selenium 更适合需要跨浏览器支持的企业级测试框架。
前端 · 2月19日 19:47
Puppeteer 如何使用 Chrome DevTools Protocol (CDP) 进行高级调试和性能分析?Puppeteer 提供了丰富的 Chrome DevTools Protocol (CDP) 功能,允许开发者访问浏览器底层的调试和性能分析能力。
**1. CDP 基础**
**创建 CDP 会话:**
```javascript
const client = await page.target().createCDPSession();
```
**启用 CDP 域:**
```javascript
await client.send('Performance.enable');
await client.send('Network.enable');
await client.send('Runtime.enable');
```
**发送 CDP 命令:**
```javascript
const result = await client.send('Performance.getMetrics');
console.log(result);
```
**监听 CDP 事件:**
```javascript
client.on('Network.requestWillBeSent', (params) => {
console.log('Request:', params.request.url);
});
```
**2. 性能监控**
**启用性能监控:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
```
**获取性能指标:**
```javascript
const metrics = await client.send('Performance.getMetrics');
console.log('Performance Metrics:', metrics.metrics);
```
**关键性能指标:**
```javascript
const metrics = await client.send('Performance.getMetrics');
const metricMap = {};
metrics.metrics.forEach(m => metricMap[m.name] = m.value);
console.log({
Timestamp: metricMap.Timestamp,
Documents: metricMap.Documents,
Frames: metricMap.Frames,
JSEventListeners: metricMap.JSEventListeners,
Nodes: metricMap.Nodes,
LayoutCount: metricMap.LayoutCount,
RecalcStyleCount: metricMap.RecalcStyleCount,
LayoutDuration: metricMap.LayoutDuration,
RecalcStyleDuration: metricMap.RecalcStyleDuration,
ScriptDuration: metricMap.ScriptDuration,
TaskDuration: metricMap.TaskDuration
});
```
**性能追踪:**
```javascript
// 开始追踪
await client.send('Performance.enable');
await client.send('Tracing.start', {
traceConfig: {
includedCategories: ['devtools.timeline', 'blink.user_timing']
}
});
// 执行操作
await page.goto('https://example.com');
// 停止追踪
const traceData = await client.send('Tracing.stop');
```
**3. 网络监控**
**启用网络监控:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Network.enable');
```
**监控网络请求:**
```javascript
client.on('Network.requestWillBeSent', (params) => {
console.log('Request:', {
url: params.request.url,
method: params.request.method,
type: params.type
});
});
```
**监控网络响应:**
```javascript
client.on('Network.responseReceived', (params) => {
console.log('Response:', {
url: params.response.url,
status: params.response.status,
mimeType: params.response.mimeType
});
});
```
**获取请求体:**
```javascript
client.on('Network.requestWillBeSent', async (params) => {
if (params.request.postData) {
console.log('Request body:', params.request.postData);
}
});
```
**获取响应体:**
```javascript
client.on('Network.responseReceived', async (params) => {
const responseBody = await client.send('Network.getResponseBody', {
requestId: params.requestId
});
console.log('Response body:', responseBody.body);
});
```
**4. 运行时调试**
**启用运行时监控:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Runtime.enable');
```
**执行 JavaScript:**
```javascript
const result = await client.send('Runtime.evaluate', {
expression: 'document.title'
});
console.log('Result:', result.result.value);
```
**获取控制台日志:**
```javascript
client.on('Runtime.consoleAPICalled', (params) => {
console.log('Console:', params.type, params.args);
});
```
**监听异常:**
```javascript
client.on('Runtime.exceptionThrown', (params) => {
console.error('Exception:', params.exceptionDetails);
});
```
**5. DOM 监控**
**启用 DOM 监控:**
```javascript
const client = await page.target().createCDPSession();
await client.send('DOM.enable');
```
**获取文档根节点:**
```javascript
const root = await client.send('DOM.getDocument');
console.log('Root node:', root.root);
```
**查询节点:**
```javascript
const result = await client.send('DOM.querySelector', {
nodeId: root.root.nodeId,
selector: '.my-element'
});
console.log('Node:', result.nodeId);
```
**获取节点属性:**
```javascript
const attributes = await client.send('DOM.getAttributes', {
nodeId: result.nodeId
});
console.log('Attributes:', attributes.attributes);
```
**6. Page 监控**
**启用 Page 监控:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Page.enable');
```
**监听页面加载:**
```javascript
client.on('Page.loadEventFired', () => {
console.log('Page loaded');
});
```
**监听导航:**
```javascript
client.on('Page.frameNavigated', (params) => {
console.log('Navigated to:', params.frame.url);
});
```
**获取页面资源树:**
```javascript
const resourceTree = await client.send('Page.getResourceTree');
console.log('Resource tree:', resourceTree);
```
**7. 实际应用场景**
**场景 1:性能分析工具**
```javascript
async function analyzePerformance(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const client = await page.target().createCDPSession();
// 启用性能监控
await client.send('Performance.enable');
await client.send('Network.enable');
const startTime = Date.now();
await page.goto(url, { waitUntil: 'networkidle2' });
const loadTime = Date.now() - startTime;
// 获取性能指标
const metrics = await client.send('Performance.getMetrics');
const metricMap = {};
metrics.metrics.forEach(m => metricMap[m.name] = m.value);
// 收集网络数据
const networkData = [];
client.on('Network.requestWillBeSent', (params) => {
networkData.push({
url: params.request.url,
method: params.request.method,
timestamp: params.timestamp
});
});
const report = {
url,
loadTime,
metrics: {
layoutDuration: metricMap.LayoutDuration,
recalcStyleDuration: metricMap.RecalcStyleDuration,
scriptDuration: metricMap.ScriptDuration,
taskDuration: metricMap.TaskDuration
},
networkRequests: networkData.length
};
await browser.close();
return report;
}
analyzePerformance('https://example.com').then(console.log);
```
**场景 2:网络请求分析**
```javascript
async function analyzeNetworkRequests(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Network.enable');
const requests = [];
client.on('Network.requestWillBeSent', (params) => {
requests.push({
requestId: params.requestId,
url: params.request.url,
method: params.request.method,
type: params.type,
timestamp: params.timestamp
});
});
client.on('Network.responseReceived', (params) => {
const request = requests.find(r => r.requestId === params.requestId);
if (request) {
request.status = params.response.status;
request.mimeType = params.response.mimeType;
request.size = params.response.encodedDataLength;
}
});
await page.goto(url, { waitUntil: 'networkidle2' });
// 分析请求
const analysis = {
totalRequests: requests.length,
byType: {},
byStatus: {},
totalSize: 0
};
requests.forEach(req => {
// 按类型统计
if (!analysis.byType[req.type]) {
analysis.byType[req.type] = { count: 0, size: 0 };
}
analysis.byType[req.type].count++;
analysis.byType[req.type].size += req.size || 0;
// 按状态码统计
if (!analysis.byStatus[req.status]) {
analysis.byStatus[req.status] = 0;
}
analysis.byStatus[req.status]++;
analysis.totalSize += req.size || 0;
});
await browser.close();
return analysis;
}
analyzeNetworkRequests('https://example.com').then(console.log);
```
**场景 3:内存分析**
```javascript
async function analyzeMemory(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Runtime.enable');
await client.send('HeapProfiler.enable');
await page.goto(url, { waitUntil: 'networkidle2' });
// 获取堆快照
const heapSnapshot = await client.send('HeapProfiler.takeHeapSnapshot', {
reportProgress: false
});
// 获取内存使用情况
const memoryMetrics = await client.send('Runtime.getHeapUsage');
const report = {
totalSize: memoryMetrics.totalSize,
usedSize: memoryMetrics.usedSize,
heapSnapshot: heapSnapshot
};
await browser.close();
return report;
}
analyzeMemory('https://example.com').then(console.log);
```
**场景 4:JavaScript 执行分析**
```javascript
async function analyzeJavaScript(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Runtime.enable');
await client.send('Debugger.enable');
const consoleLogs = [];
const exceptions = [];
client.on('Runtime.consoleAPICalled', (params) => {
consoleLogs.push({
type: params.type,
args: params.args.map(arg => arg.value)
});
});
client.on('Runtime.exceptionThrown', (params) => {
exceptions.push({
message: params.exceptionDetails.exception?.description,
stackTrace: params.exceptionDetails.stackTrace
});
});
await page.goto(url, { waitUntil: 'networkidle2' });
const report = {
consoleLogs,
exceptions,
hasErrors: exceptions.length > 0
};
await browser.close();
return report;
}
analyzeJavaScript('https://example.com').then(console.log);
```
**8. CDP 高级功能**
**覆盖代码:**
```javascript
const client = await page.target().createCDPSession();
await client.send('DOM.enable');
await client.send('CSS.enable');
// 启用代码覆盖
await client.send('Profiler.enable');
await client.send('Profiler.startPreciseCoverage', {
callCount: true,
detailed: true
});
// 执行操作
await page.goto('https://example.com');
// 获取覆盖数据
const coverage = await client.send('Profiler.takePreciseCoverage');
console.log('Coverage:', coverage.result);
```
**监控长任务:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
client.on('Performance.metrics', (params) => {
params.metrics.forEach(metric => {
if (metric.name === 'TaskDuration' && metric.value > 50) {
console.warn('Long task detected:', metric.value, 'ms');
}
});
});
```
**监控布局抖动:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
const layoutShifts = [];
client.on('Performance.metrics', (params) => {
params.metrics.forEach(metric => {
if (metric.name === 'LayoutShift') {
layoutShifts.push(metric.value);
}
});
});
// 计算累积布局偏移
const cls = layoutShifts.reduce((sum, shift) => sum + shift, 0);
console.log('Cumulative Layout Shift:', cls);
```
**9. 最佳实践**
**1. 及时禁用 CDP 域:**
```javascript
try {
await client.send('Performance.enable');
// 操作
} finally {
await client.send('Performance.disable');
}
```
**2. 批量获取数据:**
```javascript
// 一次性获取多个指标
const [metrics, networkData] = await Promise.all([
client.send('Performance.getMetrics'),
client.send('Network.getResponseBody', { requestId: 'xxx' })
]);
```
**3. 使用事件过滤:**
```javascript
client.on('Network.requestWillBeSent', (params) => {
// 只处理特定请求
if (params.request.url.includes('/api/')) {
console.log('API Request:', params.request.url);
}
});
```
**4. 错误处理:**
```javascript
try {
await client.send('Performance.getMetrics');
} catch (error) {
console.error('CDP error:', error);
// 降级处理
}
```
前端 · 2月19日 19:40
Puppeteer 中有哪些等待机制?如何正确使用它们来处理异步操作?Puppeteer 提供了多种等待机制来处理异步操作和页面加载,确保在执行操作前页面状态已就绪。
**1. page.waitForNavigation()**
等待页面导航完成,适用于点击链接、提交表单等会触发页面跳转的操作。
```javascript
await Promise.all([
page.waitForNavigation(),
page.click('#submit-button')
]);
```
**参数选项:**
- `waitUntil`: 'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2'
- `timeout`: 超时时间(毫秒)
**2. page.waitForSelector(selector)**
等待指定选择器出现在页面中。
```javascript
await page.waitForSelector('.result-item', { visible: true });
```
**参数选项:**
- `visible`: 等待元素可见
- `hidden`: 等待元素隐藏
- `timeout`: 超时时间
**3. page.waitForXPath(xpath)**
等待 XPath 选择器匹配的元素。
```javascript
await page.waitForXPath('//div[@class="content"]');
```
**4. page.waitForFunction(pageFunction, ...args)**
等待自定义函数返回真值,最灵活的等待方式。
```javascript
await page.waitForFunction(
() => document.querySelectorAll('.item').length > 5
);
// 带参数
await page.waitForFunction(
(count) => document.querySelectorAll('.item').length >= count,
{},
10
);
```
**5. page.waitForTimeout(milliseconds)**
等待指定时间(已废弃,建议使用 setTimeout)。
```javascript
// 旧方法(已废弃)
await page.waitForTimeout(1000);
// 新方法
await new Promise(resolve => setTimeout(resolve, 1000));
```
**6. page.waitForResponse(urlOrPredicate)**
等待特定的网络响应。
```javascript
// 等待特定 URL 的响应
await page.waitForResponse('https://api.example.com/data');
// 使用谓词函数
await page.waitForResponse(response =>
response.url().includes('/api/') &&
response.status() === 200
);
```
**7. page.waitForRequest(urlOrPredicate)**
等待特定的网络请求。
```javascript
await page.waitForRequest(request =>
request.url().includes('/api/data')
);
```
**8. page.waitForFrame(frame)**
等待指定的 iframe 加载完成。
```javascript
const frame = await page.waitForFrame('iframe-name');
```
**最佳实践:**
**1. 选择合适的等待方法:**
- 导航操作 → `waitForNavigation`
- 元素操作 → `waitForSelector`
- 复杂条件 → `waitForFunction`
- API 调用 → `waitForResponse`
**2. 设置合理的超时时间:**
```javascript
await page.waitForSelector('.element', {
timeout: 5000 // 5 秒超时
});
```
**3. 使用 Promise.all 并行等待:**
```javascript
await Promise.all([
page.waitForNavigation(),
page.click('#link'),
page.waitForSelector('.loaded')
]);
```
**4. 处理超时异常:**
```javascript
try {
await page.waitForSelector('.element', { timeout: 3000 });
} catch (error) {
console.log('Element not found within timeout');
}
```
**5. 优化等待策略:**
```javascript
// 等待网络空闲(推荐)
await page.waitForNavigation({ waitUntil: 'networkidle2' });
// 等待特定元素可见
await page.waitForSelector('.element', { visible: true });
```
**常见问题解决:**
**问题 1:元素存在但不可见**
```javascript
// 解决方案:等待元素可见
await page.waitForSelector('.element', { visible: true });
```
**问题 2:动态加载内容**
```javascript
// 解决方案:使用 waitForFunction 检查内容
await page.waitForFunction(() =>
document.querySelectorAll('.item').length > 0
);
```
**问题 3:SPA 路由变化**
```javascript
// 解决方案:等待 URL 变化
await page.waitForFunction(() =>
window.location.pathname === '/new-page'
);
```
前端 · 2月19日 19:40
Puppeteer 如何进行错误处理和调试?有哪些常用的调试技巧和工具?Puppeteer 提供了多种错误处理和调试技巧,帮助开发者快速定位和解决问题,提高开发效率。
**1. 基本错误处理**
**try-catch 模式:**
```javascript
const puppeteer = require('puppeteer');
async function safeExecution() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
try {
await page.goto('https://example.com');
await page.click('#button');
} catch (error) {
console.error('Error occurred:', error.message);
// 错误处理逻辑
} finally {
await browser.close();
}
}
safeExecution();
```
**超时处理:**
```javascript
try {
await page.goto('https://example.com', { timeout: 5000 });
} catch (error) {
if (error.name === 'TimeoutError') {
console.log('Page load timeout');
}
}
```
**2. 调试模式**
**启用调试模式:**
```javascript
// 方法 1:使用 headless: false
const browser = await puppeteer.launch({
headless: false,
slowMo: 100 // 减慢操作速度
});
// 方法 2:使用 devtools
const browser = await puppeteer.launch({
headless: false,
devtools: true
});
```
**使用 slowMo:**
```javascript
const browser = await puppeteer.launch({
headless: false,
slowMo: 50 // 每个操作延迟 50ms
});
```
**3. 日志记录**
**控制台日志:**
```javascript
page.on('console', msg => {
console.log('Browser console:', msg.text());
});
// 捕获不同类型的日志
page.on('console', msg => {
const type = msg.type();
const text = msg.text();
if (type === 'error') {
console.error('Browser error:', text);
} else if (type === 'warning') {
console.warn('Browser warning:', text);
} else {
console.log('Browser log:', text);
}
});
```
**页面错误日志:**
```javascript
page.on('pageerror', error => {
console.error('Page error:', error.message);
});
```
**请求失败日志:**
```javascript
page.on('requestfailed', request => {
console.log('Request failed:', request.url());
console.log('Failure:', request.failure());
});
```
**4. 截图和视频录制**
**错误时截图:**
```javascript
async function withErrorScreenshot(page, operation) {
try {
await operation();
} catch (error) {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
await page.screenshot({
path: `error-${timestamp}.png`,
fullPage: true
});
throw error;
}
}
// 使用示例
await withErrorScreenshot(page, async () => {
await page.goto('https://example.com');
await page.click('#button');
});
```
**视频录制:**
```javascript
const { spawn } = require('child_process');
async function recordVideo(page, outputPath, operation) {
// 使用 ffmpeg 录制屏幕
const ffmpeg = spawn('ffmpeg', [
'-f', 'x11grab',
'-r', '30',
'-s', '1920x1080',
'-i', ':99',
'-c:v', 'libx264',
'-preset', 'ultrafast',
outputPath
]);
try {
await operation();
} finally {
ffmpeg.kill('SIGINT');
}
}
```
**5. 网络调试**
**监控网络请求:**
```javascript
page.on('request', request => {
console.log('Request:', request.url());
});
page.on('response', response => {
console.log('Response:', response.url(), response.status());
});
page.on('requestfinished', request => {
console.log('Request finished:', request.url());
});
```
**捕获请求和响应数据:**
```javascript
const requests = [];
page.on('request', request => {
requests.push({
url: request.url(),
method: request.method(),
headers: request.headers()
});
});
page.on('response', async response => {
const request = requests.find(r => r.url === response.url());
if (request) {
request.status = response.status();
request.headers = response.headers();
try {
request.body = await response.text();
} catch (error) {
request.body = null;
}
}
});
```
**6. 性能追踪**
**启用性能追踪:**
```javascript
const client = await page.target().createCDPSession();
await client.send('Performance.enable');
await client.send('Network.enable');
// 获取性能指标
const metrics = await client.send('Performance.getMetrics');
console.log('Performance metrics:', metrics);
```
**追踪时间线:**
```javascript
await page.tracing.start({ path: 'trace.json' });
// 执行操作
await page.goto('https://example.com');
await page.tracing.stop();
```
**7. 元素调试**
**高亮元素:**
```javascript
async function highlightElement(page, selector) {
await page.evaluate(selector => {
const element = document.querySelector(selector);
if (element) {
element.style.border = '3px solid red';
element.style.backgroundColor = 'yellow';
}
}, selector);
}
```
**检查元素状态:**
```javascript
async function checkElement(page, selector) {
const isVisible = await page.isVisible(selector);
const isEnabled = await page.isDisabled(selector);
const isClickable = await page.isClickable(selector);
console.log('Element state:', {
selector,
isVisible,
isEnabled,
isClickable
});
}
```
**获取元素位置:**
```javascript
const position = await page.evaluate(selector => {
const element = document.querySelector(selector);
if (element) {
const rect = element.getBoundingClientRect();
return {
x: rect.left,
y: rect.top,
width: rect.width,
height: rect.height
};
}
}, '.element');
```
**8. 调试工具函数**
**等待并调试:**
```javascript
async function waitForAndDebug(page, selector, options = {}) {
console.log(`Waiting for selector: ${selector}`);
try {
await page.waitForSelector(selector, {
timeout: options.timeout || 30000,
visible: options.visible !== false
});
console.log(`Found selector: ${selector}`);
} catch (error) {
console.error(`Failed to find selector: ${selector}`);
await page.screenshot({ path: 'debug-failed.png' });
throw error;
}
}
```
**点击并调试:**
```javascript
async function clickAndDebug(page, selector) {
console.log(`Attempting to click: ${selector}`);
try {
// 检查元素是否存在
const element = await page.$(selector);
if (!element) {
throw new Error(`Element not found: ${selector}`);
}
// 检查元素是否可见
const isVisible = await element.isIntersectingViewport();
if (!isVisible) {
console.warn('Element is not visible, scrolling to it');
await element.scrollIntoView();
}
await element.click();
console.log(`Successfully clicked: ${selector}`);
} catch (error) {
console.error(`Failed to click: ${selector}`, error);
await page.screenshot({ path: 'debug-click-failed.png' });
throw error;
}
}
```
**9. 常见错误及解决方案**
**错误 1:元素未找到**
```javascript
// 问题:元素选择器错误
await page.click('.wrong-selector');
// 解决方案:使用正确的选择器
await page.click('.correct-selector');
// 或者等待元素出现
await page.waitForSelector('.correct-selector');
await page.click('.correct-selector');
```
**错误 2:元素不可点击**
```javascript
// 问题:元素被遮挡或不可见
await page.click('.hidden-button');
// 解决方案:滚动到元素
await page.evaluate(selector => {
document.querySelector(selector).scrollIntoView();
}, '.hidden-button');
await page.click('.hidden-button');
```
**错误 3:超时错误**
```javascript
// 问题:页面加载超时
await page.goto('https://slow-website.com');
// 解决方案:增加超时时间
await page.goto('https://slow-website.com', { timeout: 60000 });
// 或使用更宽松的等待条件
await page.goto('https://slow-website.com', {
waitUntil: 'domcontentloaded'
});
```
**错误 4:内存泄漏**
```javascript
// 问题:未关闭浏览器实例
const browser = await puppeteer.launch();
// 忘记关闭
// 解决方案:使用 finally 确保关闭
const browser = await puppeteer.launch();
try {
// 操作
} finally {
await browser.close();
}
```
**10. 调试最佳实践**
**1. 使用描述性日志:**
```javascript
console.log(`[INFO] Navigating to ${url}`);
console.log(`[DEBUG] Found ${elements.length} elements`);
console.log(`[ERROR] Failed to click button: ${error.message}`);
```
**2. 保存调试信息:**
```javascript
const debugInfo = {
url: page.url(),
timestamp: new Date().toISOString(),
screenshot: await page.screenshot({ encoding: 'base64' }),
html: await page.content(),
cookies: await page.cookies()
};
require('fs').writeFileSync('debug.json', JSON.stringify(debugInfo, null, 2));
```
**3. 使用条件断点:**
```javascript
await page.evaluate(() => {
debugger; // 在浏览器中暂停
});
```
**4. 分步调试:**
```javascript
// 使用 slowMo 减慢操作
const browser = await puppeteer.launch({ slowMo: 100 });
// 或在关键步骤添加延迟
await new Promise(resolve => setTimeout(resolve, 1000));
```
**5. 使用调试器:**
```javascript
// 在代码中添加 debugger
debugger;
// 使用 Node.js 调试器运行
node --inspect-brk script.js
```
**11. 测试和验证**
**单元测试示例:**
```javascript
const assert = require('assert');
async function testPageLoad() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
assert.strictEqual(title, 'Example Domain');
await browser.close();
}
testPageLoad().catch(console.error);
```
**集成测试示例:**
```javascript
async function testUserFlow() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 测试登录流程
await page.goto('https://example.com/login');
await page.type('#username', 'testuser');
await page.type('#password', 'password');
await page.click('#login-button');
// 验证登录成功
await page.waitForSelector('.user-profile');
const isLoggedIn = await page.$('.user-profile') !== null;
assert(isLoggedIn, 'Login failed');
await browser.close();
}
testUserFlow().catch(console.error);
```
前端 · 2月19日 19:40
Puppeteer 如何管理 Cookie 和存储?如何实现会话持久化和多账户管理?Puppeteer 提供了强大的 Cookie 和存储管理功能,可以模拟真实的用户会话、保持登录状态、管理本地存储等。
**1. Cookie 管理**
**获取所有 Cookie:**
```javascript
const cookies = await page.cookies();
console.log(cookies);
```
**获取特定 URL 的 Cookie:**
```javascript
const cookies = await page.cookies('https://example.com');
```
**设置 Cookie:**
```javascript
await page.setCookie({
name: 'session_id',
value: 'abc123',
domain: '.example.com',
path: '/',
expires: Math.floor(Date.now() / 1000) + 3600, // 1 小时后过期
httpOnly: true,
secure: true,
sameSite: 'Lax'
});
```
**设置多个 Cookie:**
```javascript
await page.setCookie(
{ name: 'cookie1', value: 'value1', domain: '.example.com' },
{ name: 'cookie2', value: 'value2', domain: '.example.com' }
);
```
**删除 Cookie:**
```javascript
// 删除指定 Cookie
await page.deleteCookie({ name: 'session_id', domain: '.example.com' });
// 删除所有 Cookie
const cookies = await page.cookies();
await page.deleteCookie(...cookies);
```
**清除所有 Cookie:**
```javascript
await page.evaluate(() => {
document.cookie.split(";").forEach(c => {
document.cookie = c.replace(/^ +/, "").replace(/=.*/, "=;expires=" + new Date().toUTCString() + ";path=/");
});
});
```
**2. LocalStorage 管理**
**获取 LocalStorage 数据:**
```javascript
const localStorageData = await page.evaluate(() => {
const data = {};
for (let i = 0; i < localStorage.length; i++) {
const key = localStorage.key(i);
data[key] = localStorage.getItem(key);
}
return data;
});
```
**设置 LocalStorage 数据:**
```javascript
await page.evaluate(() => {
localStorage.setItem('user_id', '12345');
localStorage.setItem('preferences', JSON.stringify({ theme: 'dark' }));
});
```
**获取特定 LocalStorage 项:**
```javascript
const userId = await page.evaluate(() => {
return localStorage.getItem('user_id');
});
```
**删除 LocalStorage 项:**
```javascript
await page.evaluate(() => {
localStorage.removeItem('user_id');
});
```
**清除所有 LocalStorage:**
```javascript
await page.evaluate(() => {
localStorage.clear();
});
```
**3. SessionStorage 管理**
**获取 SessionStorage 数据:**
```javascript
const sessionStorageData = await page.evaluate(() => {
const data = {};
for (let i = 0; i < sessionStorage.length; i++) {
const key = sessionStorage.key(i);
data[key] = sessionStorage.getItem(key);
}
return data;
});
```
**设置 SessionStorage 数据:**
```javascript
await page.evaluate(() => {
sessionStorage.setItem('temp_data', 'temporary_value');
});
```
**清除所有 SessionStorage:**
```javascript
await page.evaluate(() => {
sessionStorage.clear();
});
```
**4. IndexedDB 管理**
**获取 IndexedDB 数据:**
```javascript
const indexedDBData = await page.evaluate(async () => {
return new Promise((resolve, reject) => {
const request = indexedDB.open('myDatabase', 1);
request.onsuccess = (event) => {
const db = event.target.result;
const transaction = db.transaction(['myStore'], 'readonly');
const store = transaction.objectStore('myStore');
const getAllRequest = store.getAll();
getAllRequest.onsuccess = () => {
resolve(getAllRequest.result);
};
getAllRequest.onerror = () => {
reject(getAllRequest.error);
};
};
request.onerror = () => {
reject(request.error);
};
});
});
```
**5. 浏览器上下文和隔离**
**使用 Incognito 上下文:**
```javascript
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
// 在隔离环境中操作
await page.goto('https://example.com');
// 关闭上下文,清除所有数据
await context.close();
```
**多个隔离上下文:**
```javascript
// 创建多个隔离的上下文
const context1 = await browser.createIncognitoBrowserContext();
const context2 = await browser.createIncognitoBrowserContext();
const page1 = await context1.newPage();
const page2 = await context2.newPage();
// 两个上下文的 Cookie 和存储完全隔离
```
**6. 会话持久化**
**保存会话状态:**
```javascript
async function saveSession(page, filePath) {
const cookies = await page.cookies();
const localStorage = await page.evaluate(() => {
const data = {};
for (let i = 0; i < localStorage.length; i++) {
const key = localStorage.key(i);
data[key] = localStorage.getItem(key);
}
return data;
});
const session = {
cookies,
localStorage,
url: page.url(),
timestamp: Date.now()
};
const fs = require('fs');
fs.writeFileSync(filePath, JSON.stringify(session, null, 2));
}
```
**恢复会话状态:**
```javascript
async function restoreSession(page, filePath) {
const fs = require('fs');
const session = JSON.parse(fs.readFileSync(filePath, 'utf8'));
// 恢复 Cookie
await page.setCookie(...session.cookies);
// 恢复 LocalStorage
await page.evaluate((data) => {
for (const [key, value] of Object.entries(data)) {
localStorage.setItem(key, value);
}
}, session.localStorage);
// 导航到之前的 URL
await page.goto(session.url);
}
```
**7. 实际应用场景**
**场景 1:保持登录状态**
```javascript
async function loginAndSaveSession() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 登录
await page.goto('https://example.com/login');
await page.type('#username', 'user@example.com');
await page.type('#password', 'password');
await page.click('#login-button');
await page.waitForNavigation();
// 保存会话
await saveSession(page, 'session.json');
await browser.close();
}
async function useSavedSession() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// 恢复会话
await restoreSession(page, 'session.json');
// 直接访问需要登录的页面
await page.goto('https://example.com/dashboard');
// 验证是否已登录
const isLoggedIn = await page.$('.user-profile') !== null;
console.log('Is logged in:', isLoggedIn);
await browser.close();
}
```
**场景 2:多账户管理**
```javascript
async function manageMultipleAccounts(accounts) {
const browser = await puppeteer.launch();
for (const account of accounts) {
// 为每个账户创建隔离的上下文
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
// 登录账户
await page.goto('https://example.com/login');
await page.type('#username', account.username);
await page.type('#password', account.password);
await page.click('#login-button');
await page.waitForNavigation();
// 执行账户操作
await page.goto('https://example.com/dashboard');
const data = await page.evaluate(() => {
return document.querySelector('.user-data').textContent;
});
console.log(`Account ${account.username}: ${data}`);
// 关闭上下文,清除数据
await context.close();
}
await browser.close();
}
manageMultipleAccounts([
{ username: 'user1@example.com', password: 'pass1' },
{ username: 'user2@example.com', password: 'pass2' }
]);
```
**场景 3:A/B 测试**
```javascript
async function abTesting(url, variants) {
const browser = await puppeteer.launch();
for (const variant of variants) {
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
// 设置 A/B 测试 Cookie
await page.setCookie({
name: 'ab_test_variant',
value: variant.id,
domain: new URL(url).hostname
});
await page.goto(url);
// 收集数据
const data = await page.evaluate(() => {
return {
title: document.title,
content: document.querySelector('.content')?.textContent
};
});
console.log(`Variant ${variant.id}:`, data);
await context.close();
}
await browser.close();
}
abTesting('https://example.com', [
{ id: 'A' },
{ id: 'B' }
]);
```
**场景 4:购物车持久化**
```javascript
async function saveShoppingCart(page, userId) {
const cartData = await page.evaluate(() => {
return JSON.parse(localStorage.getItem('cart') || '[]');
});
const fs = require('fs');
const filePath = `carts/${userId}.json`;
fs.writeFileSync(filePath, JSON.stringify(cartData, null, 2));
}
async function restoreShoppingCart(page, userId) {
const fs = require('fs');
const filePath = `carts/${userId}.json`;
if (fs.existsSync(filePath)) {
const cartData = JSON.parse(fs.readFileSync(filePath, 'utf8'));
await page.evaluate((data) => {
localStorage.setItem('cart', JSON.stringify(data));
}, cartData);
}
}
```
**8. 安全注意事项**
**1. 敏感数据保护:**
```javascript
// 不要在代码中硬编码敏感信息
// 使用环境变量
const password = process.env.PASSWORD;
// 不要将包含敏感信息的会话文件提交到版本控制
// 将 session.json 添加到 .gitignore
```
**2. Cookie 安全:**
```javascript
// 设置安全的 Cookie 属性
await page.setCookie({
name: 'session',
value: 'value',
httpOnly: true, // 防止 XSS 攻击
secure: true, // 仅通过 HTTPS 传输
sameSite: 'Strict' // 防止 CSRF 攻击
});
```
**3. 会话过期处理:**
```javascript
async function checkSessionValidity(page) {
const cookies = await page.cookies();
const sessionCookie = cookies.find(c => c.name === 'session_id');
if (!sessionCookie || sessionCookie.expires * 1000 < Date.now()) {
// 会话已过期,重新登录
await relogin(page);
}
}
```
**9. 最佳实践**
**1. 使用隔离上下文:**
```javascript
// 为每个用户或会话创建隔离的上下文
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
// 操作完成后关闭上下文
await context.close();
```
**2. 定期清理:**
```javascript
// 定期清理过期的 Cookie 和存储
async function cleanupStorage(page) {
const cookies = await page.cookies();
const validCookies = cookies.filter(c =>
!c.expires || c.expires * 1000 > Date.now()
);
await page.deleteCookie(...cookies);
await page.setCookie(...validCookies);
}
```
**3. 错误处理:**
```javascript
try {
await page.setCookie(cookie);
} catch (error) {
console.error('Failed to set cookie:', error);
// 处理错误
}
```
**4. 性能优化:**
```javascript
// 批量操作 Cookie
await page.setCookie(...cookies);
// 避免频繁的存储操作
const data = await page.evaluate(() => {
// 一次性获取所有需要的数据
return {
localStorage: { ...localStorage },
sessionStorage: { ...sessionStorage }
};
});
```
前端 · 2月19日 19:39