Consul uses the Raft consensus algorithm to ensure data consistency in distributed systems, which is the core foundation of its high availability and reliability.
Raft Protocol Overview
Raft is an easy-to-understand consensus algorithm that decomposes the consistency problem into several relatively independent sub-problems:
- Leader Election: Elect a leader to manage log replication
- Log Replication: Leader receives client requests and replicates to other nodes
- Safety: Ensure committed logs are not lost
Raft Implementation in Consul
Node Roles
Consul Server nodes have three roles in the Raft cluster:
- Leader: Handles all client requests, responsible for log replication
- Follower: Passively receives log replication requests from Leader
- Candidate: Temporary state participating in leader election
Leader Election Process
Election Trigger Conditions
- Follower hasn't received Leader heartbeat within election timeout
- During cluster initialization
Election Steps
-
Follower becomes Candidate:
- Increment current term
- Vote for self
- Send RequestVote requests to other nodes
-
Voting Rules:
- Each node can vote only once per term
- Vote for the Candidate with the most up-to-date log
- First request received gets priority
-
Election Result:
- Receive majority votes: Become Leader
- Receive request with higher term: Become Follower
- Timeout without majority: Restart election
go// Pseudo code: election logic func (rf *Raft) startElection() { rf.currentTerm++ rf.state = Candidate rf.votedFor = rf.me for peer := range rf.peers { go rf.sendRequestVote(peer) } }
Log Replication Mechanism
Log Structure
Each node maintains a log array:
shellIndex | Term | Command ------|------|-------- 1 | 1 | set x = 1 2 | 1 | set y = 2 3 | 2 | set z = 3
Replication Flow
-
Client Request:
- Client sends write request to Leader
- Leader appends command to local log
-
AppendEntries RPC:
- Leader sends AppendEntries requests to all Followers
- Contains log entries and previous log's term/index
-
Follower Processing:
- Check if previous log matches
- Append new log if matches
- Reject and return conflict information if not
-
Commit Confirmation:
- Leader waits for majority node confirmation
- Commit log and apply to state machine
- Notify client of successful request
go// Pseudo code: log replication func (rf *Raft) replicateLog() { for !rf.killed() { if rf.state == Leader { for peer := range rf.peers { go rf.sendAppendEntries(peer) } } time.Sleep(heartbeatInterval) } }
Consistency Guarantees
Log Matching Property
- If two logs contain entries with the same index and term, all previous entries are identical
- Leader never overwrites or deletes committed logs
Leader Completeness
- Only nodes containing all committed logs can become Leader
- Prevents data loss from old Leader being re-elected
Safety Guarantee
- Only committed logs can be applied to state machine
- Clients only see results of committed write operations
Consul Raft Configuration
Basic Configuration
hclserver = true bootstrap_expect = 3 datacenter = "dc1" data_dir = "/opt/consul/data"
Key Parameters
- bootstrap_expect: Expected number of Server nodes
- election_timeout: Election timeout
- heartbeat_timeout: Heartbeat timeout
- leader_lease_timeout: Leader lease timeout
hclraft_protocol = 3 election_timeout = "1500ms" heartbeat_timeout = "1000ms" leader_lease_timeout = "500ms"
Failure Recovery
Leader Failure
- Follower detects Leader failure (heartbeat timeout)
- Triggers election, elects new Leader
- New Leader continues unfinished log replication
Network Partition
- Majority partition continues service
- Minority partition cannot commit new logs
- After partition recovery, majority Leader continues leading
Node Restart
- Restarting node recovers state from snapshot
- Catches up to latest state through log replication
- Participates normally in cluster after catch-up
Performance Optimization
Batch Log Replication
hcl# Configure batch replication parameters raft_multiplier = 8
Snapshot Mechanism
Periodically create snapshots to reduce log size:
hcl# Snapshot configuration snapshot_interval = "30s" snapshot_threshold = 8192
Pre-vote Mechanism
Prevent unnecessary elections from network partitions:
hcl# Enable pre-vote pre_vote = true
Monitoring and Debugging
Raft Status Query
bash# View Raft status consul operator raft list-peers # View Raft configuration consul operator raft configuration # Remove node consul operator raft remove-peer -id=node1
Log Analysis
bash# View Raft logs journalctl -u consul -f | grep raft
Best Practices
- Odd number of Server nodes: 3, 5, 7 nodes to avoid split-brain
- Cross-datacenter deployment: Distribute Server nodes across different availability zones
- Regular backup: Backup Raft logs and snapshots
- Monitor metrics: Monitor election count, log latency, commit latency
- Version upgrade: Rolling upgrade, avoid upgrading multiple nodes simultaneously
Consul's Raft implementation ensures strong consistency in distributed environments and is the foundation for building high-availability service discovery systems.