Question
To cut tail latency you enabled hedged requests on a hot internal RPC: if the first attempt doesn't respond within the p95 (~15ms), the client fires a second request to another replica and takes whichever returns first. It worked beautifully and shaved the tail. Then at peak this morning the backend tier tipped over: when the backend got mildly slow, request volume to it suddenly nearly DOUBLED, pushing it from 80% to 100% CPU, which made more requests cross the 15ms threshold, which fired more hedges — a runaway. There was no traffic spike from real users. How do you triage and mitigate, and how should hedging be configured safely?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.