How to Design and Implement Elasticsearch's Hot-Cold Architecture? - 面试题

In modern big data applications, Elasticsearch, as a distributed search and analytics engine, critically requires performance and cost optimization. As data volumes surge, single-node architectures struggle to meet demands for high throughput, low latency, and cost-effective storage. Hot-Cold Architecture emerged to address this, partitioning data based on access frequency into Hot Data (recently active, frequently queried indices like logs or real-time transaction data) and Cold Data (historical or infrequently accessed indices like archived logs), enabling fine-grained resource management: Hot Data is stored on high-performance nodes to accelerate queries, while Cold Data is migrated to low-cost nodes to reduce storage overhead. This article delves into the design principles, implementation details, and best practices of Hot-Cold Architecture, helping developers build efficient, scalable Elasticsearch deployments.

Overview of Hot-Cold Architecture

Definition and Background

The core concept of Hot-Cold Architecture is dynamically allocating resources based on data lifecycle. Hot Data refers to recently active indices requiring high I/O and low latency access; Cold Data refers to historical or infrequently accessed indices that can tolerate high latency while requiring low-cost storage. Elasticsearch 7.10+ versions natively support this architecture through Index Lifecycle Management (ILM) and Data Streams technologies, eliminating the complexity of manual shard management.

Why is Hot-Cold Architecture needed?

Cost Optimization: Storage costs for Cold Data can be reduced by over 60% (based on AWS S3 vs. EBS benchmark tests).
Performance Improvement: Hot nodes can reduce query latency by 40% (referencing Elastic Stack performance reports).
Scalability: Supports dynamic data growth, preventing single-cluster overload.

Key Components

Hot-Cold Architecture relies on these core components:

Hot Nodes: Equipped with SSD storage, high CPU, and memory, optimized for indexing and searching.
Cold Nodes: Utilizing HDD storage and low-cost instances, designed exclusively for read-only queries.
Index Lifecycle Management (ILM): Automates data routing policies, triggering migration based on time or size.
Data Streams: Simplifies index management by automatically creating time-partitioned indices.

Design Principles

Data Lifecycle Management

When designing Hot-Cold Architecture, define clear data lifecycle stages:

Hot Stage: Data within 7 days of creation, used for high-frequency queries.
Warm Stage: Data retained for 30 days, used only for read operations (optional).
Cold Stage: Data exceeding 90 days, stored only without search participation.

Design Considerations:

Set thresholds based on business scenarios: For example, log-based applications typically configure max_age: 7d for the Hot stage.
Avoid overcomplication: The Warm stage is optional; directly transition to Cold to simplify architecture.

Sharding Strategy

Sharding strategies must align with Hot-Cold nodes:

Hot Data Shards: Allocated to Hot Nodes, ensuring shard size < 50GB (to prevent single-node overload).
Cold Data Shards: Migrated to Cold Nodes, allowing shard size > 50GB to conserve resources.

Best Practices:

Use number_of_shards fixed at 1 to prevent mixing Hot-Cold data shards.
Enable index.codec: best_compression for Hot Data to reduce storage footprint.

Implementation Steps

Configure ILM Policy

ILM is the foundation for implementing Hot-Cold Architecture. Define policies via API to specify migration rules:

json
{
  "policy": {
    "description": "Elasticsearch Hot-Cold Policy",
    "index_patterns": ["logs-*"],
    "data_streams": { "enabled": true },
    "policy": {
      "description": "Hot-Cold Automation",
      "indices": {
        "rollover": {
          "max_size": "50gb",
          "max_age": "7d"
        },
        "delete": {
          "min_age": "90d"
        }
      },
      "actions": {
        "allocate": {
          "require": {
            "data": "hot"
          }
        },
        "allocate": {
          "require": {
            "data": "cold"
          }
        }
      }
    }
  }
}

Key Configuration Notes:

rollover: Automatically rolls over indices when size reaches 50GB or age hits 7 days.
delete: Automatically deletes Cold Data after 90 days.
allocate.require: Forces data routing to Hot/Cold nodes (requires prior node role configuration).

Deploy Hot-Cold Nodes

In an Elasticsearch cluster, explicitly define node roles:

Create Hot Nodes:

bash
curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{
  "persistent": {
    "cluster.routing.allocation.require.data": "hot",
    "cluster.routing.allocation.require.index": "hot"
  }
}'

Create Cold Nodes:

bash
curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{
  "persistent": {
    "cluster.routing.allocation.require.data": "cold",
    "cluster.routing.allocation.require.index": "cold"
  }
}'

Node Configuration Recommendations:

Hot Nodes: Use elasticsearch-node as the data attribute (e.g., data: hot).
Cold Nodes: Use elasticsearch-node as the data attribute (e.g., data: cold).
Ensure Cold Nodes lack the search role to prevent query performance degradation.

Code Example: Automated Data Migration

The following Python script demonstrates data migration using the Elasticsearch Python API:

python
from elasticsearch import Elasticsearch

client = Elasticsearch()

# Create data stream index (automatically manages Hot Data)
client.indices.create(
    index='logs-2023-10',
    body={
        'settings': {
            'index.lifecycle.rollover.condition': 'max_age:7d',
            'index.lifecycle.rollover.max_age': '7d'
        }
    }
)

# Trigger Cold Data migration (example: after 90 days)
client.indices.put_settings(
    index='logs-2023-10',
    body={
        'index.lifecycle.rollover': {
            'max_size': '50gb',
            'max_age': '7d'
        },
        'index.lifecycle.delete': {
            'min_age': '90d'
        }
    }
)

Important Notes:

ILM must be enabled first: PUT /_ilm/policy to configure policies.
Cold Data migration should trigger during the delete stage to avoid query interruptions.

Practical Recommendations

Monitoring and Tuning

Key Metrics: Monitor indexing_total and search_total in cluster.stats to ensure Hot Node load < 70%.
Tool Recommendations: Use Kibana Visualize panels to track migration rates (e.g., ilm: data_stream indices).
Threshold Settings: Automatically trigger shard reorganization when Hot Data shard size > 80GB.

Avoid Common Pitfalls

Data Fragmentation: Mixing Hot-Cold data storage degrades query performance; isolate via require policies.
Cold Data Query Latency: Cold Nodes support only read-only queries; retain a Warm stage (optional) for real-time analysis.
Configuration Errors: Misconfiguring index.lifecycle.rollover causes data stagnation; regularly verify ILM status with GET /_ilm/explain.

Performance Optimization Tips

Storage Compression: Enable index.codec: best_compression for Hot Data; use it for Cold Data to save space.
Bulk Operations: Use bulk API for Hot Data writes to boost throughput.
Auto-Scaling: Combine with Kubernetes for Hot Node deployment, using HPA to dynamically adjust based on CPU metrics.

Conclusion

Elasticsearch's Hot-Cold Architecture significantly optimizes storage costs and query performance through lifecycle management. Design with business scenarios as the foundation, define clear Hot-Cold thresholds, and implement automation via ILM and node roles. Practical results show that proper configuration can reduce cloud storage costs by 30-60% while improving query response times. Developers should prioritize deploying ILM policies and continuously monitor cluster health. Future trends include machine learning-driven dynamic resource allocation (e.g., Elasticsearch 8.0's ML features) to enhance architectural intelligence. Remember: Hot-Cold Architecture is not a silver bullet; it requires iterative adjustments based on data characteristics to achieve optimal balance.

Elasticsearch Hot-Cold Architecture Diagram

References:

Appendix: Key Configuration Cheat Sheet

Component	Hot Data	Cold Data
Storage Type	SSD (EBS gp3)	HDD (S3)
Node Role	`data: hot`	`data: cold`
Index Settings	`index.codec: best_compression`	`index.codec: best_compression`
Lifecycle	`max_age: 7d`	`min_age: 90d`