Elasticsearch Cross-Cluster Replication (CCR) is a core feature introduced in Elasticsearch 7.10.0, designed to synchronize data across different clusters, ensuring data consistency and high availability. It addresses data silo issues in distributed systems through a leader cluster (Leader Cluster) and follower cluster (Follower Cluster) architecture, particularly suitable for multi-region deployments. This article will delve into the implementation principles, configuration steps, and best practices of CCR to help developers efficiently build cross-cluster data streams.
What is Elasticsearch Cross-Cluster Replication (CCR)?
CCR is a unidirectional data replication mechanism that allows one cluster (source cluster) to synchronize data in real-time to another cluster (target cluster). Its core design principle is unidirectional replication: the source cluster acts as the leader, and the target cluster as the follower, with data flow moving from the leader to the follower. This differs from traditional master-slave replication, as CCR abstracts network isolation through the Remote Cluster concept, avoiding direct exposure of internal network structures.
Key components include:
- Leader Cluster: Data source cluster, configured with
remote.clusterpointing to the target cluster. - Follower Cluster: Data receiving cluster, configured with
remote.clusterpointing to the source cluster. - Replication Stream: Data synchronization channel, using Sequence Numbers to ensure data order.
CCR offers advantages such as:
- Low-latency synchronization: Data written to the leader is quickly transmitted to the follower via lightweight protocols.
- High availability: Avoids single points of failure, supporting cross-region disaster recovery.
- Resource optimization: Only replicates new data, reducing bandwidth consumption.
1. Remote Cluster Configuration
CCR's foundation is Remote Cluster Registration. The source cluster must configure the target cluster's metadata via elasticsearch.yml:
yaml# Source cluster configuration (leader cluster) cluster.remote.cluster1.remote.cluster: "follower-cluster" cluster.remote.cluster1.remote.hosts: ["follower-cluster-node1:9300", "follower-cluster-node2:9300"]
The target cluster (follower cluster) must register the source cluster:
yaml# Target cluster configuration cluster.remote.cluster2.remote.cluster: "leader-cluster" cluster.remote.cluster2.remote.hosts: ["leader-cluster-node1:9300"]
Note: The
cluster.remote.clustervalue must be unique and match both sides. Incorrect configuration leads to connection failures, verified viaGET /_remote/infoAPI.
2. Index-level Replication Configuration
CCR operates at the index level, requiring explicit enablement. When creating an index, specify via remote parameter:
jsonPUT /my-index/_create { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0, "remote": { "cluster": "follower-cluster" } } } }
Key parameters:
index.remote.cluster: Specifies the follower cluster name (must matchcluster.remote).index.remote.index: Specifies the target index name (defaults to the source index).
3. Data Synchronization Process
Data synchronization occurs in three stages:
- Data Writing: Client writes to the leader cluster; Elasticsearch generates Sequence Numbers.
- Stream Transmission: Data packets are sent to the follower via Remote Cluster API (e.g.,
POST /_remote/leader/_replicate). - Acknowledgement: After confirmation, the follower returns
acknowledgedstatus.

Important note: CCR uses a snapshot mechanism to prevent data loss. If the follower cluster has high latency, data is temporarily stored in the
_remoteindex, ensuring write consistency.
Practical Configuration: Setting Up CCR Clusters
The following steps demonstrate CCR configuration in production environments.
Step 1: Initialize Remote Clusters
On the leader cluster (using curl):
bash# Register follower cluster curl -X PUT "http://leader-cluster:9200/_remote/cluster/follower-cluster" -H 'Content-Type: application/json' -d '{"cluster_id":"follower-cluster"}' # Verify connection curl -X GET "http://leader-cluster:9200/_remote/info?cluster=follower-cluster"
Step 2: Configure Index Replication
On the leader cluster, create the index and enable CCR:
jsonPUT /my-index/_settings { "index": { "remote": { "cluster": "follower-cluster", "index": "my-index" } } }
On the follower cluster, create the index:
jsonPUT /my-index { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 1 } } }
Step 3: Start Data Replication
Start CCR stream via API:
jsonPOST /_ccr/remote/leader/_replicate?index=my-index { "remote": { "cluster": "follower-cluster" } }
- Verify synchronization status: Use
GET /_ccr/remote/leader/_state?index=my-indexto check progress. Status code"state":"syncing"indicates normal synchronization.
Step 4: Monitoring and Troubleshooting
- Monitoring metrics: Check
bytes_inandbytes_outin theindex.remoteindex via Kibana or Elasticsearch API. - Common issues:
- Network problems: Verify firewall rules to ensure port 9300 is open.
- High latency: Adjust
max_replication_delayparameter forindex.remote.cluster(default 300s). - Data conflicts: Use
GET /_ccr/remote/leader/_state?index=my-indexto detectconflictsfield.
Best Practices and Recommendations
- Network configuration: Ensure low-latency, high-bandwidth connections between clusters. Use VPC networks for isolation to avoid public internet risks.
- Data volume management: Only replicate necessary indices. Avoid enabling CCR in high-write scenarios to prevent blocking write threads.
- Security hardening: Encrypt remote connections with TLS (enable
xpack.security), and set access controls forremote.cluster. - Disaster recovery design: Configure multiple replicas on the follower cluster to avoid single points of failure. For example, set
index.number_of_replicas: 2. - Test environment: Validate CCR in development clusters first. Test synchronization streams using
curl:
bashcurl -X POST "http://leader-cluster:9200/_ccr/remote/leader/_replicate?index=my-index" -H 'Content-Type: application/json' -d '{"index": "my-index"}'
Conclusion
Elasticsearch CCR achieves efficient and reliable cross-cluster data replication through sequence number-driven and remote cluster registration mechanisms. It is suitable for cloud-native architectures and multi-region deployments, significantly enhancing system resilience. Developers should follow the configure network, enable index, monitor and verify workflow to avoid common pitfalls. For large-scale production environments, integrate Elasticsearch Monitoring tools (e.g., monitoring plugin) to continuously track synchronization health. With proper configuration, CCR can serve as the core foundation for building distributed data platforms.
Reference resources: Elasticsearch Official CCR Documentation