乐闻世界logo
搜索文章和话题

How to Design and Implement Elasticsearch's Hot-Cold Architecture?

3月6日 21:11

In modern big data applications, Elasticsearch, as a distributed search and analytics engine, critically requires performance and cost optimization. As data volumes surge, single-node architectures struggle to meet demands for high throughput, low latency, and cost-effective storage. Hot-Cold Architecture emerged to address this, partitioning data based on access frequency into Hot Data (recently active, frequently queried indices like logs or real-time transaction data) and Cold Data (historical or infrequently accessed indices like archived logs), enabling fine-grained resource management: Hot Data is stored on high-performance nodes to accelerate queries, while Cold Data is migrated to low-cost nodes to reduce storage overhead. This article delves into the design principles, implementation details, and best practices of Hot-Cold Architecture, helping developers build efficient, scalable Elasticsearch deployments.

Overview of Hot-Cold Architecture

Definition and Background

The core concept of Hot-Cold Architecture is dynamically allocating resources based on data lifecycle. Hot Data refers to recently active indices requiring high I/O and low latency access; Cold Data refers to historical or infrequently accessed indices that can tolerate high latency while requiring low-cost storage. Elasticsearch 7.10+ versions natively support this architecture through Index Lifecycle Management (ILM) and Data Streams technologies, eliminating the complexity of manual shard management.

Why is Hot-Cold Architecture needed?

  • Cost Optimization: Storage costs for Cold Data can be reduced by over 60% (based on AWS S3 vs. EBS benchmark tests).
  • Performance Improvement: Hot nodes can reduce query latency by 40% (referencing Elastic Stack performance reports).
  • Scalability: Supports dynamic data growth, preventing single-cluster overload.

Key Components

Hot-Cold Architecture relies on these core components:

  • Hot Nodes: Equipped with SSD storage, high CPU, and memory, optimized for indexing and searching.
  • Cold Nodes: Utilizing HDD storage and low-cost instances, designed exclusively for read-only queries.
  • Index Lifecycle Management (ILM): Automates data routing policies, triggering migration based on time or size.
  • Data Streams: Simplifies index management by automatically creating time-partitioned indices.

Design Principles

Data Lifecycle Management

When designing Hot-Cold Architecture, define clear data lifecycle stages:

  • Hot Stage: Data within 7 days of creation, used for high-frequency queries.
  • Warm Stage: Data retained for 30 days, used only for read operations (optional).
  • Cold Stage: Data exceeding 90 days, stored only without search participation.

Design Considerations:

  • Set thresholds based on business scenarios: For example, log-based applications typically configure max_age: 7d for the Hot stage.
  • Avoid overcomplication: The Warm stage is optional; directly transition to Cold to simplify architecture.

Sharding Strategy

Sharding strategies must align with Hot-Cold nodes:

  • Hot Data Shards: Allocated to Hot Nodes, ensuring shard size < 50GB (to prevent single-node overload).
  • Cold Data Shards: Migrated to Cold Nodes, allowing shard size > 50GB to conserve resources.

Best Practices:

  • Use number_of_shards fixed at 1 to prevent mixing Hot-Cold data shards.
  • Enable index.codec: best_compression for Hot Data to reduce storage footprint.

Implementation Steps

Configure ILM Policy

ILM is the foundation for implementing Hot-Cold Architecture. Define policies via API to specify migration rules:

json
{ "policy": { "description": "Elasticsearch Hot-Cold Policy", "index_patterns": ["logs-*"], "data_streams": { "enabled": true }, "policy": { "description": "Hot-Cold Automation", "indices": { "rollover": { "max_size": "50gb", "max_age": "7d" }, "delete": { "min_age": "90d" } }, "actions": { "allocate": { "require": { "data": "hot" } }, "allocate": { "require": { "data": "cold" } } } } } }

Key Configuration Notes:

  • rollover: Automatically rolls over indices when size reaches 50GB or age hits 7 days.
  • delete: Automatically deletes Cold Data after 90 days.
  • allocate.require: Forces data routing to Hot/Cold nodes (requires prior node role configuration).

Deploy Hot-Cold Nodes

In an Elasticsearch cluster, explicitly define node roles:

  1. Create Hot Nodes:
bash
curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.require.data": "hot", "cluster.routing.allocation.require.index": "hot" } }'
  1. Create Cold Nodes:
bash
curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.require.data": "cold", "cluster.routing.allocation.require.index": "cold" } }'

Node Configuration Recommendations:

  • Hot Nodes: Use elasticsearch-node as the data attribute (e.g., data: hot).
  • Cold Nodes: Use elasticsearch-node as the data attribute (e.g., data: cold).
  • Ensure Cold Nodes lack the search role to prevent query performance degradation.

Code Example: Automated Data Migration

The following Python script demonstrates data migration using the Elasticsearch Python API:

python
from elasticsearch import Elasticsearch client = Elasticsearch() # Create data stream index (automatically manages Hot Data) client.indices.create( index='logs-2023-10', body={ 'settings': { 'index.lifecycle.rollover.condition': 'max_age:7d', 'index.lifecycle.rollover.max_age': '7d' } } ) # Trigger Cold Data migration (example: after 90 days) client.indices.put_settings( index='logs-2023-10', body={ 'index.lifecycle.rollover': { 'max_size': '50gb', 'max_age': '7d' }, 'index.lifecycle.delete': { 'min_age': '90d' } } )

Important Notes:

  • ILM must be enabled first: PUT /_ilm/policy to configure policies.
  • Cold Data migration should trigger during the delete stage to avoid query interruptions.

Practical Recommendations

Monitoring and Tuning

  • Key Metrics: Monitor indexing_total and search_total in cluster.stats to ensure Hot Node load < 70%.
  • Tool Recommendations: Use Kibana Visualize panels to track migration rates (e.g., ilm: data_stream indices).
  • Threshold Settings: Automatically trigger shard reorganization when Hot Data shard size > 80GB.

Avoid Common Pitfalls

  • Data Fragmentation: Mixing Hot-Cold data storage degrades query performance; isolate via require policies.
  • Cold Data Query Latency: Cold Nodes support only read-only queries; retain a Warm stage (optional) for real-time analysis.
  • Configuration Errors: Misconfiguring index.lifecycle.rollover causes data stagnation; regularly verify ILM status with GET /_ilm/explain.

Performance Optimization Tips

  • Storage Compression: Enable index.codec: best_compression for Hot Data; use it for Cold Data to save space.
  • Bulk Operations: Use bulk API for Hot Data writes to boost throughput.
  • Auto-Scaling: Combine with Kubernetes for Hot Node deployment, using HPA to dynamically adjust based on CPU metrics.

Conclusion

Elasticsearch's Hot-Cold Architecture significantly optimizes storage costs and query performance through lifecycle management. Design with business scenarios as the foundation, define clear Hot-Cold thresholds, and implement automation via ILM and node roles. Practical results show that proper configuration can reduce cloud storage costs by 30-60% while improving query response times. Developers should prioritize deploying ILM policies and continuously monitor cluster health. Future trends include machine learning-driven dynamic resource allocation (e.g., Elasticsearch 8.0's ML features) to enhance architectural intelligence. Remember: Hot-Cold Architecture is not a silver bullet; it requires iterative adjustments based on data characteristics to achieve optimal balance.

Elasticsearch Hot-Cold Architecture Diagram

References:


Appendix: Key Configuration Cheat Sheet

ComponentHot DataCold Data
Storage TypeSSD (EBS gp3)HDD (S3)
Node Roledata: hotdata: cold
Index Settingsindex.codec: best_compressionindex.codec: best_compression
Lifecyclemax_age: 7dmin_age: 90d
标签:ElasticSearch