Code Room
On-callMedium
Question
Your backend calls an internal inventory API that enforces a 2,000 rps per-service quota. After a client-library upgrade shipped this morning, your inventory calls start getting throttled (429s) even though your user traffic is flat. Dashboards show your outbound inventory request rate doubled at the exact deploy time — with no increase in incoming user requests. The new client library changed the default: a failed call now retries up to 5 times with no backoff, and a borderline-slow inventory shard causes a trickle of failures. How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.