In modern big data applications, Elasticsearch, as a distributed search and analytics engine, critically requires performance and cost optimization. As data volumes surge, single-node architectures struggle to meet demands for high throughput, low latency, and cost-effective storage. Hot-Cold Architecture emerged to address this, partitioning data based on access frequency into Hot Data (recently active, frequently queried indices like logs or real-time transaction data) and Cold Data (historical or infrequently accessed indices like archived logs), enabling fine-grained resource management: Hot Data is stored on high-performance nodes to accelerate queries, while Cold Data is migrated to low-cost nodes to reduce storage overhead. This article delves into the design principles, implementation details, and best practices of Hot-Cold Architecture, helping developers build efficient, scalable Elasticsearch deployments.
Overview of Hot-Cold Architecture
Definition and Background
The core concept of Hot-Cold Architecture is dynamically allocating resources based on data lifecycle. Hot Data refers to recently active indices requiring high I/O and low latency access; Cold Data refers to historical or infrequently accessed indices that can tolerate high latency while requiring low-cost storage. Elasticsearch 7.10+ versions natively support this architecture through Index Lifecycle Management (ILM) and Data Streams technologies, eliminating the complexity of manual shard management.
Why is Hot-Cold Architecture needed?
- Cost Optimization: Storage costs for Cold Data can be reduced by over 60% (based on AWS S3 vs. EBS benchmark tests).
- Performance Improvement: Hot nodes can reduce query latency by 40% (referencing Elastic Stack performance reports).
- Scalability: Supports dynamic data growth, preventing single-cluster overload.
Key Components
Hot-Cold Architecture relies on these core components:
- Hot Nodes: Equipped with SSD storage, high CPU, and memory, optimized for indexing and searching.
- Cold Nodes: Utilizing HDD storage and low-cost instances, designed exclusively for read-only queries.
- Index Lifecycle Management (ILM): Automates data routing policies, triggering migration based on time or size.
- Data Streams: Simplifies index management by automatically creating time-partitioned indices.
Design Principles
Data Lifecycle Management
When designing Hot-Cold Architecture, define clear data lifecycle stages:
- Hot Stage: Data within 7 days of creation, used for high-frequency queries.
- Warm Stage: Data retained for 30 days, used only for read operations (optional).
- Cold Stage: Data exceeding 90 days, stored only without search participation.
Design Considerations:
- Set thresholds based on business scenarios: For example, log-based applications typically configure
max_age: 7dfor the Hot stage. - Avoid overcomplication: The Warm stage is optional; directly transition to Cold to simplify architecture.
Sharding Strategy
Sharding strategies must align with Hot-Cold nodes:
- Hot Data Shards: Allocated to Hot Nodes, ensuring shard size < 50GB (to prevent single-node overload).
- Cold Data Shards: Migrated to Cold Nodes, allowing shard size > 50GB to conserve resources.
Best Practices:
- Use
number_of_shardsfixed at 1 to prevent mixing Hot-Cold data shards. - Enable
index.codec: best_compressionfor Hot Data to reduce storage footprint.
Implementation Steps
Configure ILM Policy
ILM is the foundation for implementing Hot-Cold Architecture. Define policies via API to specify migration rules:
json{ "policy": { "description": "Elasticsearch Hot-Cold Policy", "index_patterns": ["logs-*"], "data_streams": { "enabled": true }, "policy": { "description": "Hot-Cold Automation", "indices": { "rollover": { "max_size": "50gb", "max_age": "7d" }, "delete": { "min_age": "90d" } }, "actions": { "allocate": { "require": { "data": "hot" } }, "allocate": { "require": { "data": "cold" } } } } } }
Key Configuration Notes:
rollover: Automatically rolls over indices when size reaches 50GB or age hits 7 days.delete: Automatically deletes Cold Data after 90 days.allocate.require: Forces data routing to Hot/Cold nodes (requires prior node role configuration).
Deploy Hot-Cold Nodes
In an Elasticsearch cluster, explicitly define node roles:
- Create Hot Nodes:
bashcurl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.require.data": "hot", "cluster.routing.allocation.require.index": "hot" } }'
- Create Cold Nodes:
bashcurl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.require.data": "cold", "cluster.routing.allocation.require.index": "cold" } }'
Node Configuration Recommendations:
- Hot Nodes: Use
elasticsearch-nodeas thedataattribute (e.g.,data: hot). - Cold Nodes: Use
elasticsearch-nodeas thedataattribute (e.g.,data: cold). - Ensure Cold Nodes lack the
searchrole to prevent query performance degradation.
Code Example: Automated Data Migration
The following Python script demonstrates data migration using the Elasticsearch Python API:
pythonfrom elasticsearch import Elasticsearch client = Elasticsearch() # Create data stream index (automatically manages Hot Data) client.indices.create( index='logs-2023-10', body={ 'settings': { 'index.lifecycle.rollover.condition': 'max_age:7d', 'index.lifecycle.rollover.max_age': '7d' } } ) # Trigger Cold Data migration (example: after 90 days) client.indices.put_settings( index='logs-2023-10', body={ 'index.lifecycle.rollover': { 'max_size': '50gb', 'max_age': '7d' }, 'index.lifecycle.delete': { 'min_age': '90d' } } )
Important Notes:
- ILM must be enabled first:
PUT /_ilm/policyto configure policies. - Cold Data migration should trigger during the
deletestage to avoid query interruptions.
Practical Recommendations
Monitoring and Tuning
- Key Metrics: Monitor
indexing_totalandsearch_totalincluster.statsto ensure Hot Node load < 70%. - Tool Recommendations: Use Kibana Visualize panels to track migration rates (e.g.,
ilm: data_streamindices). - Threshold Settings: Automatically trigger shard reorganization when Hot Data shard size > 80GB.
Avoid Common Pitfalls
- Data Fragmentation: Mixing Hot-Cold data storage degrades query performance; isolate via
requirepolicies. - Cold Data Query Latency: Cold Nodes support only read-only queries; retain a Warm stage (optional) for real-time analysis.
- Configuration Errors: Misconfiguring
index.lifecycle.rollovercauses data stagnation; regularly verify ILM status withGET /_ilm/explain.
Performance Optimization Tips
- Storage Compression: Enable
index.codec: best_compressionfor Hot Data; use it for Cold Data to save space. - Bulk Operations: Use
bulk APIfor Hot Data writes to boost throughput. - Auto-Scaling: Combine with Kubernetes for Hot Node deployment, using HPA to dynamically adjust based on CPU metrics.
Conclusion
Elasticsearch's Hot-Cold Architecture significantly optimizes storage costs and query performance through lifecycle management. Design with business scenarios as the foundation, define clear Hot-Cold thresholds, and implement automation via ILM and node roles. Practical results show that proper configuration can reduce cloud storage costs by 30-60% while improving query response times. Developers should prioritize deploying ILM policies and continuously monitor cluster health. Future trends include machine learning-driven dynamic resource allocation (e.g., Elasticsearch 8.0's ML features) to enhance architectural intelligence. Remember: Hot-Cold Architecture is not a silver bullet; it requires iterative adjustments based on data characteristics to achieve optimal balance.

References:
Appendix: Key Configuration Cheat Sheet
| Component | Hot Data | Cold Data |
|---|---|---|
| Storage Type | SSD (EBS gp3) | HDD (S3) |
| Node Role | data: hot | data: cold |
| Index Settings | index.codec: best_compression | index.codec: best_compression |
| Lifecycle | max_age: 7d | min_age: 90d |