Nginx 常见问题有哪些？如何进行故障排查？

Nginx 在运行过程中可能会遇到各种问题，掌握故障排查方法对于快速解决问题至关重要。

常见问题及解决方案：

1. 502 Bad Gateway

原因：后端服务不可用或连接超时

排查步骤：

bash
# 检查后端服务状态
systemctl status php-fpm
systemctl status nginx

# 检查后端服务端口
netstat -tlnp | grep :9000

# 检查 Nginx 错误日志
tail -f /var/log/nginx/error.log

# 检查后端服务日志
tail -f /var/log/php-fpm/error.log

解决方案：

nginx
# 增加超时时间
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;

# 检查后端服务配置
fastcgi_connect_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;

2. 504 Gateway Timeout

原因：后端服务处理时间过长

排查步骤：

bash
# 检查后端服务性能
top -u nginx
htop

# 检查数据库连接
mysql -u root -p -e "SHOW PROCESSLIST;"

# 检查慢查询日志
tail -f /var/log/mysql/slow.log

解决方案：

nginx
# 增加超时时间
proxy_read_timeout 300s;
fastcgi_read_timeout 300s;

# 优化后端服务性能
# 优化数据库查询
# 增加缓存

3. 403 Forbidden

原因：权限不足或访问控制限制

排查步骤：

bash
# 检查文件权限
ls -la /var/www/html

# 检查 Nginx 用户
ps aux | grep nginx

# 检查 SELinux 状态
getenforce

# 检查防火墙规则
iptables -L -n

解决方案：

bash
# 修改文件权限
chown -R nginx:nginx /var/www/html
chmod -R 755 /var/www/html

# 临时关闭 SELinux
setenforce 0

# 添加防火墙规则
firewall-cmd --add-service=http --permanent
firewall-cmd --reload

4. 404 Not Found

原因：文件不存在或路径配置错误

排查步骤：

bash
# 检查文件是否存在
ls -la /var/www/html

# 检查 Nginx 配置
nginx -T | grep root

# 检查符号链接
readlink -f /var/www/html

解决方案：

nginx
# 检查 root 配置
server {
    listen 80;
    server_name example.com;
    root /var/www/html;
    index index.html index.php;
    
    location / {
        try_files $uri $uri/ =404;
    }
}

5. 413 Request Entity Too Large

原因：上传文件超过限制

解决方案：

nginx
# 增加 client_max_body_size
client_max_body_size 100m;

# PHP 配置
# /etc/php.ini
upload_max_filesize = 100M
post_max_size = 100M

6. 连接数不足

原因：worker_connections 设置过小

排查步骤：

bash
# 检查当前连接数
netstat -an | grep :80 | wc -l

# 检查 Nginx 状态
curl http://localhost/nginx_status

# 检查系统限制
ulimit -n

解决方案：

nginx
# 增加连接数
events {
    worker_connections 10240;
}

# 增加文件描述符限制
worker_rlimit_nofile 65535;

诊断工具：

1. 配置测试

bash
# 测试配置文件
nginx -t

# 显示配置
nginx -T

# 检查配置语法
nginx -c /etc/nginx/nginx.conf -t

2. 状态监控

nginx
# 启用状态页面
location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

3. 日志分析

bash
# 实时查看错误日志
tail -f /var/log/nginx/error.log

# 查看最近 100 行错误
tail -n 100 /var/log/nginx/error.log

# 搜索特定错误
grep "502" /var/log/nginx/error.log

# 统计错误数量
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

4. 性能分析

bash
# 使用 strace 追踪系统调用
strace -p $(pidof nginx)

# 使用 tcpdump 抓包
tcpdump -i eth0 port 80 -w nginx.pcap

# 使用 netstat 查看连接
netstat -an | grep :80 | awk '{print $6}' | sort | uniq -c

性能问题排查：

1. CPU 使用率高

bash
# 检查 CPU 使用
top -p $(pidof nginx)

# 检查 worker 进程数
ps aux | grep nginx | wc -l

# 检查 CPU 亲和性
taskset -cp $(pidof nginx)

解决方案：

nginx
# 调整 worker_processes
worker_processes auto;

# 绑定 CPU 核心
worker_cpu_affinity auto;

# 启用高效文件传输
sendfile on;
tcp_nopush on;

2. 内存使用过高

bash
# 检查内存使用
free -m

# 检查进程内存
ps aux | grep nginx | awk '{print $6}' | awk '{sum+=$1} END {print sum}'

# 检查内存泄漏
valgrind --leak-check=full nginx

解决方案：

nginx
# 减少缓冲区大小
client_body_buffer_size 128k;
client_header_buffer_size 1k;

# 优化连接数
worker_connections 4096;

# 启用文件缓存
open_file_cache max=100000 inactive=20s;

3. 响应慢

bash
# 检查响应时间
curl -w "@curl-format.txt" -o /dev/null -s http://example.com

# 检查网络延迟
ping example.com

# 检查 DNS 解析
nslookup example.com

解决方案：

nginx
# 启用 Gzip 压缩
gzip on;
gzip_min_length 1024;

# 启用缓存
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=cache:10m;

# 优化 TCP 参数
tcp_nodelay on;
tcp_nopush on;

安全问题排查：

1. DDoS 攻击

bash
# 检查异常连接
netstat -an | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn

# 检查请求频率
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

解决方案：

nginx
# 启用限流
limit_req_zone $binary_remote_addr zone=limit:10m rate=10r/s;
limit_req zone=limit burst=20 nodelay;

# 限制连接数
limit_conn_zone $binary_remote_addr zone=conn:10m;
limit_conn conn 10;

2. 恶意访问

bash
# 检查可疑 User-Agent
grep "bot" /var/log/nginx/access.log

# 检查 SQL 注入尝试
grep "union.*select" /var/log/nginx/access.log

解决方案：

nginx
# 阻止恶意 User-Agent
if ($http_user_agent ~* (bot|crawl|spider)) {
    return 403;
}

# 防止 SQL 注入
if ($args ~* "union.*select.*\(") {
    return 403;
}

监控和告警：

1. 系统监控

bash
# 使用 Prometheus + Grafana
# 使用 Zabbix
# 使用 Nagios

2. 日志监控

bash
# 使用 ELK Stack
# 使用 Graylog
# 使用 Fluentd

3. 自动告警

bash
# 使用 Alertmanager
# 使用 PagerDuty
# 使用 Slack 集成

最佳实践：

定期备份配置：备份 Nginx 配置文件
监控日志：实时监控错误日志
性能测试：定期进行压力测试
文档记录：记录常见问题和解决方案
自动化部署：使用配置管理工具
版本控制：使用 Git 管理配置文件
定期更新：保持 Nginx 版本最新
安全审计：定期进行安全检查

故障排查流程：

shell
1. 确认问题现象
   ↓
2. 检查 Nginx 状态
   ↓
3. 查看错误日志
   ↓
4. 检查配置文件
   ↓
5. 检查后端服务
   ↓
6. 检查系统资源
   ↓
7. 应用解决方案
   ↓
8. 验证修复效果
   ↓
9. 记录问题和解决方案