Code Room
On-callMedium
Question
You front a fragile legacy downstream with a token-bucket rate limiter configured for 500 rps with a burst (bucket) size of 50,000 — chosen long ago to 'be forgiving.' Normally fine. Today a batch job fires a sudden 40,000-request burst; your limiter passes almost all of it instantly (the big bucket had refilled overnight), the legacy downstream — which can only really handle ~600 rps — gets buried, times out, and cascades 503s back to you. The average rate over the minute was well under 500 rps. How do you triage and fix the limiter behavior?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.