乐闻世界logo
搜索文章和话题

How to use Prometheus for monitoring in microservices architecture?

2月21日 15:40

Prometheus monitoring practices in microservices architecture:

Service Mesh Monitoring (Istio/Linkerd):

  • Collect metrics using Sidecar proxies
  • Monitor service-to-service call relationships
  • Trace request chains
  • Configuration example:
yaml
scrape_configs: - job_name: 'istio-pilot' kubernetes_sd_configs: - role: endpoints namespaces: names: [istio-system] relabel_configs: - source_labels: [__meta_kubernetes_service_name] action: keep regex: istio-pilot

Distributed Tracing Integration:

  • Use OpenTelemetry to collect metrics
  • Integrate with Jaeger/Zipkin
  • Correlate traces and monitoring data

Service Dependency Monitoring:

promql
# Service-to-service latency histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, source, target) ) # Service error rate sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service)

Canary Deployment Monitoring:

  • Use labels to distinguish versions
  • Compare performance of old and new versions
  • Automatic rollback alerts

Configuration Example:

yaml
# Use version labels scrape_configs: - job_name: 'api' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_version] target_label: version

SLA/SLO Monitoring:

promql
# Error rate SLO sum(rate(http_requests_total{status=~"5.."}[30d])) by (service) / sum(rate(http_requests_total[30d])) by (service) < 0.01 # Latency SLO histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[30d])) by (le, service) ) < 0.5

Best Practices:

  1. Unified Naming Conventions:

    • Use standardized metric names
    • Maintain label consistency
    • Document metric meanings
  2. Service-Level Metrics:

    • RED method: Rate, Errors, Duration
    • USE method: Utilization, Saturation, Errors
  3. Automated Monitoring:

    • Auto-discover services via annotations
    • Use Operator for automatic configuration
    • Infrastructure as Code
  4. Alerting Strategy:

    • Tiered alerts (P0/P1/P2/P3)
    • Alert inhibition and aggregation
    • On-call rotation and escalation policies
标签:Prometheus