Node Role Separation and Configuration
In Elasticsearch, proper allocation of node roles (such as master, data, and coordinating) is crucial for avoiding single points of failure and resource wastage. Master nodes manage cluster metadata, data nodes store index data, and coordinating nodes handle client requests. Incorrect role assignment can lead to performance bottlenecks or data loss.
-
Configuration Principles:
- Strictly separate roles: In production, recommend at least 3 master nodes (to avoid split-brain scenarios), and keep data nodes separate from coordinating nodes.
- Configure roles via
elasticsearch.yml:
yaml# Example: Data node configuration node.roles: [data, ingest] # Avoid master node roles node.attr: {data: true} # Master node configuration node.roles: [master, data] # Recommend no more than 3 nodes node.attr: {master: true}
- Practical Recommendations: Use
xpack.securityfor security, avoid assigning all roles to a single node. Monitor metrics includingcluster-healthstatus andnodesnode load.
Shard and Replica Optimization
Shards split indices into parallel units, and replicas provide redundancy. Incorrect configuration can lead to performance degradation or data unavailability.
-
Key Parameters:
number_of_shards: Recommend 3-5 (avoid too few causing hotspots, too many increasing overhead).number_of_replicas: Set to 1 or 2 in production (avoid 0 causing single points of failure).- Shard Size: Single shard should not exceed 50GB (refer to Elasticsearch official documentation Shard Size Guidelines).
-
Configuration Example:
jsonPUT /logs_index { "settings": { "number_of_shards": 3, "number_of_replicas": 1, "index.refresh_interval": "1s" // Reduce refresh frequency to improve write performance } }
-
Practical Recommendations:
- Set
index.codec=best_compressionfor critical indices to save storage. - Use
PUT /_cluster/settingsto dynamically adjust replicas:
- Set
jsonPUT /_cluster/settings { "persistent": { "cluster.routing.allocation.enable": "all" } }
- Avoid creating too many indices on a single node (exceeding 100 can cause performance issues).
Index Lifecycle Management (ILM)
Index Lifecycle Management is central to scaling strategies. Unmanaged indices can lead to storage explosion and query latency.
-
Best Practices:
-
Phase Division:
- Hot Phase: Active data with high write volume; set
index.lifecycle.ILM.rollover_alias. - Warm Phase: Archived data with reduced query frequency; use
index.lifecycle.ILM.rollover. - Cold Phase: Read-only data; migrate to low-cost nodes.
- Hot Phase: Active data with high write volume; set
-
Configuration Example:
-
jsonPUT /_ilm/policy/log_policy { "policy": { "description": "Log index lifecycle", "schema": { "description": "Rollover on size", "rollover": { "max_size": "50gb", "max_age": "30d" } } } }
-
Scaling Strategies:
- Use
ILMto automatically roll over indices, avoiding manual management. - Monitor
indexing_ratemetrics; trigger scaling when write volume exceeds thresholds. - Practical Recommendations: Combine with Kibana's Lens tool to analyze index distribution and ensure data balance.
- Use
Cluster Scaling and Balancing
Horizontal scaling requires careful execution to avoid data skew.
-
Scaling Steps:
- Add new nodes:
bash# Ensure new node configuration is consistent (elasticsearch.yml) curl -XPUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{"transient":{"cluster.routing.allocation.enable":"all"}}'
-
Monitor balancing: Use
GET /_cat/shards?vto confirm shard distribution. -
Avoid Issues:
- Adding too many nodes at once can cause shard migration storms.
- Ensure new nodes have similar hardware (CPU/RAM/SSD) to existing nodes.
-
Performance Optimization:
- Configure
indices.cache.request.enable: truefor data nodes to improve cache hit rates. - Set
cluster.routing.allocation.enable: allto allow automatic rebalancing. - Practical Recommendations: Use
cluster reroutecommand to manually adjust shard locations:
- Configure
jsonPOST /_cluster/reroute { "commands": [ { "allocate": { "index": "logs_index", "shard": 0, "node": "node_3", "accept_data_loss": false } } ] }
Monitoring and Alerting System
Real-time monitoring is essential for successful scaling.
-
Core Tools:
- Kibana: Visualize cluster health (
GET /_cluster/health), monitor metrics includingstatus(green/yellow/red) anddocs.count. - Elastic Stack: Set up alerting rules (e.g., notify when
disk_usage > 85%).
- Kibana: Visualize cluster health (
-
Practical Recommendations:
-
Use
GET /_nodes/statsto retrieve node statistics. -
Regularly run
GET /_cluster/health?prettyto check status. -
Avoid Common Pitfalls:
- Do not set
cluster.routing.allocation.enabletoallunless necessary (may cause data inconsistency). - Monitor
search_phase_execution_timeto avoid query timeouts.
- Do not set
-
Conclusion
The best practices for configuring and scaling Elasticsearch clusters revolve around systematic design and dynamic optimization: role separation, proper shard and replica settings, ILM management, and monitoring/alerting are core. Production recommendations:
- Priority: Ensure cluster health (green status) before scaling capacity.
- Continuous Improvement: Regularly use
cluster statsto analyze performance bottlenecks, adjust configurations based on log analysis. - Security Note: Enable
xpack.securityto protect the cluster, prevent unauthorized access.
By following these practices, system reliability can be significantly improved. Refer to the Elasticsearch Official Guide for deeper exploration, or use Docker Compose to quickly deploy a test environment.