Code Room
On-callMediumoc-g041
Subject Retry amplificationLevel Mid–Senior~35 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

A request flows A → B → C. C has intermittently slow calls (its p99 is 6s). A's client timeout to B is 2s with 2 retries; B's client timeout to C is 5s with 2 retries. When C is slow, you observe: A times out at 2s and retries, but B is still happily working on the first request (and will keep going up to 5s × retries), and the backend ends up doing far more work for C than the user ever waits for. C's load climbs even though user traffic is flat. Triage and explain the timeout/retry misconfiguration.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.