How to configure Prometheus alert rules and Alertmanager? - 面试题

Prometheus alert configuration and Alertmanager usage:

Alert Rule Configuration:

yaml
groups:
  - name: example_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

Key Fields:

expr: Alert expression
for: Duration condition must be met
labels: Alert labels
annotations: Alert description

Alertmanager Configuration:

yaml
route:
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'

receivers:
  - name: 'default'
    email_configs:
      - to: 'alert@example.com'
        from: 'prometheus@example.com'
    webhook_configs:
      - url: 'http://webhook.example.com/alert'

Alert Grouping:

group_by: Group by labels
group_wait: Wait time to merge alerts in same group
group_interval: Interval between alerts in group
repeat_interval: Repeat notification interval

Alert Inhibition:

yaml
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Alert Silencing:

Create silence rules via API
Support time ranges and matchers
Suitable for maintenance windows

Best Practices:

Set reasonable alert thresholds to avoid alert fatigue
Use tiered alerts (info, warning, critical)
Regularly review and optimize alert rules
Combine with Grafana for visual alerts