What is a Shard in Elasticsearch?
In Elasticsearch, a shard is a mechanism for distributing an index across multiple nodes, enabling distributed processing and storage of data. Shards serve as a core mechanism for achieving high availability and scalability in Elasticsearch. Each shard is essentially an independent "index" that holds a portion of the data, distributed across various shards according to specific rules (such as hashing).
What Types of Shards Exist in Elasticsearch?
Elasticsearch features two primary types of shards:
-
Primary Shard: The primary shard is the original location of the data. When creating an index, you must specify the number of primary shards, which remains fixed after index creation. Each document is stored within a primary shard, determined by Elasticsearch's routing algorithm.
-
Replica Shard: A replica shard is a copy of the primary shard. Its purpose is to provide data redundancy (preventing data loss) and to handle read load. The number of replica shards can be dynamically adjusted after index creation. Read operations can be handled by either the primary shard or any replica shard, which enhances read performance under high system load.
Example
Suppose you have an Elasticsearch index containing extensive book information. You can configure 5 primary shards with 1 replica shard per primary shard. This setup distributes your data across 5 primary shards, with each primary shard having a corresponding replica shard. If one node fails, the replica shard ensures no data loss, and query operations can be redirected to healthy replica shards, maintaining application availability and response speed.