Code Room
On-callMedium
Question
Right after a config push to your Envoy edge proxies, legitimate users start getting 429 `Too Many Requests` even at normal, low traffic. The Envoy dashboards show the local rate-limit filter rejecting a large fraction of requests, and the rate-limit counter is keyed on a header that's the same for everyone behind a corporate NAT — so thousands of distinct users share one bucket. The intended limit was per-API-key; the config push changed the descriptor key. Error rate is high but backend CPU is low and there's no real overload. How do you triage and fix?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.