On-callHardoc-g503

Subject Upstream timeoutLevel Senior–Staff~40 minCommon in Networking & APIs · Distributed systems interviewsIndustries Technology, Software development

Question

A request flows gateway → A → B → C (all gRPC). The gateway sets a 2s deadline. At 20:00, C develops a latency tail (p99.9 ~5s). The gateway times out at 2s and returns errors to users — expected. But you ALSO see C and B CPU climbing and their inbound request rate rising, even though the callers that triggered the work already gave up at 2s. Dashboards: B and C keep processing requests whose gateway deadline has long passed; retries from A add more load. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.