Code Room
System designHard
Question
Design tail-latency control for a sharded search system where a single query fans out to 200 shards and the response can only return once every shard replies — so the slowest shard (the straggler) sets the user-visible latency, and at 200 shards p99-per-shard becomes near-certain on every query. The product needs a hard p99 < 150ms even when a few shards are slow (GC pause, hot partition, a degraded node). How do you stop one slow shard from dictating every query's latency, while keeping results good enough?
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.