Prometheus monitoring practices in microservices architecture:
Service Mesh Monitoring (Istio/Linkerd):
- Collect metrics using Sidecar proxies
- Monitor service-to-service call relationships
- Trace request chains
- Configuration example:
yamlscrape_configs: - job_name: 'istio-pilot' kubernetes_sd_configs: - role: endpoints namespaces: names: [istio-system] relabel_configs: - source_labels: [__meta_kubernetes_service_name] action: keep regex: istio-pilot
Distributed Tracing Integration:
- Use OpenTelemetry to collect metrics
- Integrate with Jaeger/Zipkin
- Correlate traces and monitoring data
Service Dependency Monitoring:
promql# Service-to-service latency histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, source, target) ) # Service error rate sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service)
Canary Deployment Monitoring:
- Use labels to distinguish versions
- Compare performance of old and new versions
- Automatic rollback alerts
Configuration Example:
yaml# Use version labels scrape_configs: - job_name: 'api' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_version] target_label: version
SLA/SLO Monitoring:
promql# Error rate SLO sum(rate(http_requests_total{status=~"5.."}[30d])) by (service) / sum(rate(http_requests_total[30d])) by (service) < 0.01 # Latency SLO histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[30d])) by (le, service) ) < 0.5
Best Practices:
-
Unified Naming Conventions:
- Use standardized metric names
- Maintain label consistency
- Document metric meanings
-
Service-Level Metrics:
- RED method: Rate, Errors, Duration
- USE method: Utilization, Saturation, Errors
-
Automated Monitoring:
- Auto-discover services via annotations
- Use Operator for automatic configuration
- Infrastructure as Code
-
Alerting Strategy:
- Tiered alerts (P0/P1/P2/P3)
- Alert inhibition and aggregation
- On-call rotation and escalation policies