Code Room
System designHard
Question
A payments API is approaching overload during a flash sale: incoming traffic exceeds capacity and naive behavior (accept everything) is driving latency up and triggering client retries that make it worse (a retry storm / metastable failure). Design adaptive traffic shaping and load shedding that keeps the system in a healthy regime. Cover how you decide what to shed vs admit, how you prioritize (a checkout vs a balance-poll), how you prevent retries from amplifying overload, and how the system recovers once it's tipped over.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.