The concept of rebalancing is fundamental to Kafka's consumer group architecture. When a consumer group is created, the group coordinator assigns partitions to each consumer in the group. Each consumer is responsible for consuming data from its assigned partitions. However, as consumers join or leave the group or new partitions are added to a topic, the partition assignments become unbalanced. This is where rebalancing comes into play[1].
How relalance works
Kafka provides several partition assignment strategies to determine how partitions are assigned during a rebalance and is called an “assignor”. The default partition assignment strategy is round-robin, where Kafka assigns partitions to consumers one after another. However, Kafka also provides “range” and “cooperative sticky” assignment strategies, which may be more appropriate for specific use cases.
When a rebalance occurs:
- Kafka notifies each consumer in the group by sending a GroupCoordinator message.
- Each consumer then responds with a JoinGroup message, indicating its willingness to participate in the rebalance.
- Kafka then uses the selected partition assignment strategy to assign partitions to each consumer in the group.
During a rebalance, Kafka may need to pause data consumption temporarily. This is necessary to ensure all consumers have an up-to-date view of the partition assignments before re-consuming data.
When rebalance happens
- Consumer joins or leaves
- Temporary consumer failure
- Consumer idle for too long
- Topic partitions added
Side effects of Kafka rebalancing
- Increased latency
- Reduced throughput
- Increased resource usage
- Potential data duplication and loss
- Increased complexity
Kafka rebalancing may result in significant data duplication (40% or more in some cases) adding to throughput and cost issues. In rare cases, Kafka rebalancing leads to data loss if improperly handled. For example, a consumer leaves a consumer group while it still has unprocessed messages. Those messages may be lost once the rebalancing process begins. To prevent data loss, it is essential to ensure messages are properly committed to Kafka and that all consumers regularly participate in rebalancing.