What is Kafka's Consumer Group Rebalance mechanism? - 面试题

Kafka Consumer Group Rebalance Mechanism

Consumer Group Rebalance is an important mechanism in Kafka used to reallocate Partitions when Consumer Group members change. Understanding the Rebalance mechanism is crucial for ensuring Kafka consumption stability and high availability.

Rebalance Trigger Conditions

1. Consumer Member Changes

New Consumer Joins: New Consumer instance joins the Consumer Group
Consumer Leaves: Consumer instance exits normally or abnormally
Consumer Failure: Consumer crashes or network interruption
Consumer Timeout: Consumer fails to send heartbeat for more than session.timeout.ms

2. Partition Count Changes

Topic Partition Increases: Topic's Partition count increases
Topic Partition Decreases: Topic's Partition count decreases
Topic Deletion: Topic is deleted

3. Subscription Changes

Consumer Subscribes to New Topic: Consumer starts subscribing to a new Topic
Consumer Unsubscribes: Consumer unsubscribes from a Topic

Rebalance Process

1. Rebalance Triggered

After trigger conditions are met, Controller detects that Rebalance is needed
Controller notifies Group Coordinator to start Rebalance

2. Join Group Phase

All Consumers send JoinGroup requests to Group Coordinator
Group Coordinator selects one Consumer as Leader
Leader Consumer is responsible for formulating the Partition allocation plan

3. Sync Group Phase

Leader Consumer sends the allocation plan to Group Coordinator
Group Coordinator sends the allocation plan to all Consumers
Consumers receive the allocation plan and start consuming

4. Completion Phase

Consumers start consuming assigned Partitions
Rebalance process completes

Rebalance Strategies

Range Strategy (Default)

Principle: Allocate Partitions in order of Partition and Consumer order

Allocation Rules:

Sort Partitions in numerical order
Sort Consumers by name
Each Consumer gets a continuous range of Partitions

Example:

shell
Topic: test-topic, Partitions: 0,1,2,3,4,5
Consumers: C1, C2, C3

Allocation Result:
C1: 0,1
C2: 2,3
C3: 4,5

Characteristics:

Relatively even allocation
May cause uneven allocation (when Partition count is not divisible by Consumer count)

RoundRobin Strategy

Principle: Round-robin allocation of Partitions

Allocation Rules:

Merge Partitions from all Topics
Allocate to Consumers in round-robin fashion

Example:

shell
Topic1: 0,1,2
Topic2: 0,1
Consumers: C1, C2

Allocation Result:
C1: Topic1-0, Topic1-2, Topic2-1
C2: Topic1-1, Topic2-0

Characteristics:

More even allocation
Suitable for multiple Topics

Sticky Strategy

Principle: Maintain original allocation as much as possible while ensuring even allocation

Characteristics:

Reduce Partition movement between Consumers
Reduce Rebalance impact
Improve consumption continuity

CooperativeSticky Strategy

Principle: Incremental Rebalance, only reallocate affected Partitions

Characteristics:

Reduce Stop-the-world time
Improve system availability
Suitable for scenarios with high continuity requirements

Rebalance Configuration

Key Parameters

properties
# Session timeout
session.timeout.ms=30000

# Heartbeat interval
heartbeat.interval.ms=3000

# Maximum poll interval
max.poll.interval.ms=300000

# Rebalance timeout
max.poll.records=500

Parameter Description

session.timeout.ms: Consumer is considered invalid if it doesn't send heartbeat for this long
heartbeat.interval.ms: Interval at which Consumer sends heartbeats
max.poll.interval.ms: Maximum interval between Consumer polls
max.poll.records: Maximum number of messages returned per poll

Rebalance Problems and Solutions

1. Frequent Rebalance Triggers

Causes:

Consumers frequently go online/offline
Network instability causes heartbeat timeout
Message processing takes too long

Solutions:

properties
# Increase session timeout
session.timeout.ms=60000

# Increase heartbeat interval
heartbeat.interval.ms=5000

# Increase poll interval
max.poll.interval.ms=600000

2. Long Rebalance Time

Causes:

Too many Consumers
Too many Partitions
High network latency

Solutions:

Use CooperativeSticky strategy
Reduce Consumer count
Optimize network configuration

3. Rebalance Causes Duplicate Message Consumption

Causes:

Consumers may consume duplicate messages during Rebalance
Offset not committed in time

Solutions:

properties
# Disable auto commit
enable.auto.commit=false

# Manual commit Offset
consumer.commitSync();

4. Rebalance Causes Consumption Interruption

Causes:

Consumers stop consuming during Rebalance
Stop-the-world time too long

Solutions:

Use CooperativeSticky strategy
Reduce Rebalance trigger frequency
Optimize Rebalance configuration

Best Practices

1. Reasonably Configure Parameters

properties
# Recommended configuration
session.timeout.ms=30000
heartbeat.interval.ms=3000
max.poll.interval.ms=300000
max.poll.records=500

2. Choose Appropriate Rebalance Strategy

General Scenarios: Use Range or RoundRobin
Need Continuity: Use Sticky
High Availability Requirements: Use CooperativeSticky

3. Monitor Rebalance

bash
# View Consumer Group status
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --describe --group my-group

# View Rebalance logs
tail -f /path/to/kafka/logs/server.log | grep Rebalance

4. Optimize Consumption Logic

Avoid long processing time for single messages
Use asynchronous processing to improve efficiency
Reasonably set batch processing size

5. Prevent Rebalance

Keep Consumers running stably
Avoid frequent Consumer start/stop
Monitor Consumer health status

Rebalance Monitoring Metrics

RebalanceRatePerSec: Rebalance times per second
RebalanceTotal: Total Rebalance times
FailedRebalanceRate: Failed Rebalance ratio
SuccessfulRebalanceRate: Successful Rebalance ratio

Through reasonable configuration and optimization of the Rebalance mechanism, the impact of Rebalance on the system can be effectively reduced, improving Kafka consumption stability and availability.