Code Room
System designHard
Question
Design the partition-assignment + rebalancing coordination for a consumer-group system processing a 2,000-partition event stream across an autoscaling pool of consumers (5–100 nodes). Each partition must be owned by exactly one consumer at a time, ownership must rebalance when nodes join/leave, and rebalancing must not cause a long global stop-the-world pause where no partitions are being processed. How do you assign partitions and minimize disruption during rebalances?
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.