How to use Prometheus for monitoring in microservices architecture? - 面试题

Prometheus monitoring practices in microservices architecture:

Service Mesh Monitoring (Istio/Linkerd):

Collect metrics using Sidecar proxies
Monitor service-to-service call relationships
Trace request chains
Configuration example:

yaml
scrape_configs:
  - job_name: 'istio-pilot'
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [istio-system]
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        action: keep
        regex: istio-pilot

Distributed Tracing Integration:

Use OpenTelemetry to collect metrics
Integrate with Jaeger/Zipkin
Correlate traces and monitoring data

Service Dependency Monitoring:

promql
# Service-to-service latency
histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, source, target)
)

# Service error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/ sum(rate(http_requests_total[5m])) by (service)

Canary Deployment Monitoring:

Use labels to distinguish versions
Compare performance of old and new versions
Automatic rollback alerts

Configuration Example:

yaml
# Use version labels
scrape_configs:
  - job_name: 'api'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_version]
        target_label: version

SLA/SLO Monitoring:

promql
# Error rate SLO
sum(rate(http_requests_total{status=~"5.."}[30d])) by (service)
/ sum(rate(http_requests_total[30d])) by (service) < 0.01

# Latency SLO
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[30d])) by (le, service)
) < 0.5

Best Practices:

Unified Naming Conventions:
- Use standardized metric names
- Maintain label consistency
- Document metric meanings
Service-Level Metrics:
- RED method: Rate, Errors, Duration
- USE method: Utilization, Saturation, Errors
Automated Monitoring:
- Auto-discover services via annotations
- Use Operator for automatic configuration
- Infrastructure as Code
Alerting Strategy:
- Tiered alerts (P0/P1/P2/P3)
- Alert inhibition and aggregation
- On-call rotation and escalation policies