Prometheus storage optimization and performance tuning strategies:
Data Retention Policy:
yamlstorage: tsdb: retention.time: 15d retention.size: 10GB
- Set retention time based on disk space and query requirements
- Use
retention.sizeto limit disk usage
Scraping Optimization:
- Set reasonable
scrape_interval(recommended 15s-60s) - Use
scrape_timeoutto avoid slow queries - Set longer scrape intervals for less important metrics
- Use
metric_relabel_configsto filter unnecessary metrics
Query Optimization:
- Avoid full queries, use label filtering
- Choose appropriate time window sizes
- Use Recording Rules to pre-calculate common queries
- Spread query times to avoid peak periods
Memory Optimization:
- Adjust
--storage.tsdb.retention.time - Use
--storage.tsdb.head-chunks.write-queue-sizeto control write queue - Monitor memory usage, clean old data promptly
- Consider using Thanos or VictoriaMetrics for long-term storage
Recording Rules Example:
yamlgroups: - name: api_rules rules: - record: job:http_requests:rate5m expr: sum by (job) (rate(http_requests_total[5m]))
Monitoring Prometheus Itself:
prometheus_tsdb_compaction_durationprometheus_tsdb_head_samples_appended_totalprometheus_target_interval_length_seconds
Best Practices:
- Regularly clean up unnecessary metrics
- Use federation architecture to distribute load
- Consider using remote write to separate hot and cold data