Nginx 如何进行监控和运维?有哪些监控工具?
Nginx 的监控和运维对于保证服务稳定性和性能至关重要。合理的监控可以及时发现和解决问题。
内置状态监控:
nginx# 启用 stub_status 模块 server { listen 80; server_name localhost; location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } }
状态信息说明:
- Active connections:当前活动连接数
- accepts:已接受的连接总数
- handled:已处理的连接总数
- requests:已处理的请求总数
- Reading:正在读取请求头的连接数
- Writing:正在发送响应的连接数
- Waiting:空闲连接数
自定义监控端点:
nginxserver { listen 80; server_name localhost; # 健康检查 location /health { access_log off; return 200 "OK\n"; add_header Content-Type text/plain; } # 就绪检查 location /ready { access_log off; # 检查后端连接 proxy_pass http://backend/health; proxy_intercept_errors off; } # 版本信息 location /version { access_log off; return 200 "Nginx/1.21.0\n"; add_header Content-Type text/plain; } }
日志监控:
nginx# 自定义日志格式 log_format monitoring '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' 'rt=$request_time uct="$upstream_connect_time" ' 'uht="$upstream_header_time" urt="$upstream_response_time" ' 'cache=$upstream_cache_status'; server { listen 80; server_name example.com; access_log /var/log/nginx/monitoring.log monitoring; location / { proxy_pass http://backend; } }
Prometheus 监控:
nginx# 安装 nginx-prometheus-exporter # https://github.com/nginxinc/nginx-prometheus-exporter # 配置 Nginx server { listen 80; server_name localhost; location /metrics { proxy_pass http://localhost:9113/metrics; access_log off; allow 127.0.0.1; deny all; } }
Grafana + Prometheus 监控:
yaml# prometheus.yml global: scrape_interval: 15s scrape_configs: - job_name: 'nginx' static_configs: - targets: ['localhost:9113']
ELK Stack 监控:
nginx# JSON 格式日志 log_format json_combined escape=json '{' '"time_local":"$time_local",' '"remote_addr":"$remote_addr",' '"remote_user":"$remote_user",' '"request":"$request",' '"status":"$status",' '"body_bytes_sent":"$body_bytes_sent",' '"request_time":"$request_time",' '"http_referrer":"$http_referer",' '"http_user_agent":"$http_user_agent"' '}'; server { listen 80; server_name example.com; access_log /var/log/nginx/access.log json_combined; location / { proxy_pass http://backend; } }
Zabbix 监控:
bash# 安装 Zabbix Agent # 配置监控项 # nginx_status[accepts] # nginx_status[handled] # nginx_status[requests] # nginx_status[reading] # nginx_status[writing] # nginx_status[waiting]
性能监控指标:
nginx# 启用详细日志 log_format performance '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' 'rt=$request_time ' 'uct=$upstream_connect_time ' 'uht=$upstream_header_time ' 'urt=$upstream_response_time ' 'cache=$upstream_cache_status'; server { listen 80; server_name example.com; access_log /var/log/nginx/performance.log performance; location / { proxy_pass http://backend; } }
告警配置:
nginx# 基于日志的告警 map $status $alert_level { ~^[5] critical; ~^[4] warning; default ok; } server { listen 80; server_name example.com; access_log /var/log/nginx/access.log performance; location / { proxy_pass http://backend; # 添加告警头 add_header X-Alert-Level $alert_level; } }
自动化运维脚本:
bash#!/bin/bash # nginx_monitor.sh # 检查 Nginx 状态 check_nginx_status() { if ! curl -f http://localhost/nginx_status > /dev/null 2>&1; then echo "Nginx status page is not accessible" return 1 fi return 0 } # 检查进程 check_nginx_process() { if ! pgrep -x nginx > /dev/null; then echo "Nginx process is not running" return 1 fi return 0 } # 检查端口 check_nginx_port() { if ! netstat -tlnp | grep :80 > /dev/null; then echo "Nginx is not listening on port 80" return 1 fi return 0 } # 主函数 main() { check_nginx_status check_nginx_process check_nginx_port echo "All checks passed" } main
运维命令:
bash# 重载配置(不中断服务) nginx -s reload # 优雅停止 nginx -s quit # 快速停止 nginx -s stop # 重新打开日志文件 nginx -s reopen # 测试配置 nginx -t # 查看版本 nginx -v # 查看编译参数 nginx -V
日志轮转:
bash# /etc/logrotate.d/nginx /var/log/nginx/*.log { daily missingok rotate 14 compress delaycompress notifempty create 0640 nginx adm sharedscripts postrotate [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid` endscript }
完整监控配置示例:
nginxuser nginx; worker_processes auto; http { # 日志格式 log_format main '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"'; log_format performance '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' 'rt=$request_time ' 'uct=$upstream_connect_time ' 'uht="$upstream_header_time" ' 'urt="$upstream_response_time" ' 'cache=$upstream_cache_status'; log_format json_combined escape=json '{' '"time_local":"$time_local",' '"remote_addr":"$remote_addr",' '"remote_user":"$remote_user",' '"request":"$request",' '"status":"$status",' '"body_bytes_sent":"$body_bytes_sent",' '"request_time":"$request_time",' '"http_referrer":"$http_referer",' '"http_user_agent":"$http_user_agent"' '}'; # 主站点 server { listen 80; server_name example.com; root /var/www/html; index index.html; # 性能日志 access_log /var/log/nginx/performance.log performance; error_log /var/log/nginx/error.log warn; # 监控端点 location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } location /health { access_log off; return 200 "OK\n"; add_header Content-Type text/plain; } location /ready { access_log off; proxy_pass http://backend/health; proxy_intercept_errors off; } location /metrics { proxy_pass http://localhost:9113/metrics; access_log off; allow 127.0.0.1; deny all; } location / { proxy_pass http://backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } }
监控工具推荐:
- Prometheus + Grafana:强大的监控和可视化平台
- ELK Stack:日志收集、存储和分析
- Zabbix:企业级监控系统
- Nagios:成熟的监控解决方案
- Datadog:云端监控服务
- New Relic:应用性能监控
- AppDynamics:应用性能管理
运维最佳实践:
- 全面监控:监控性能、日志、资源使用
- 及时告警:设置合理的告警阈值
- 定期备份:备份配置和重要数据
- 自动化运维:使用脚本和工具自动化运维
- 文档记录:详细记录运维操作和问题
- 定期演练:定期进行故障演练
- 性能优化:持续监控和优化性能
- 安全审计:定期进行安全检查
- 容量规划:根据业务增长进行容量规划
- 持续改进:根据监控数据持续改进