Code Room
System designHardsd-g214
Subject QuorumLevel Senior–Staff~45 minCommon in Algorithms & data structures interviewsIndustries Technology, Software development

Question

Design the replica-repair strategy for a distributed time-series/metrics store (N=3 replicas per partition, eventual consistency, tunable quorums) ingesting 2M writes/sec. Over time replicas drift: dropped writes, hinted-handoff replays that never landed, nodes that were down for hours. You need replicas to re-converge without re-streaming terabytes of data on every check, and without a foreground read having to fix everything. How do you detect and repair divergence efficiently?

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.