On-callHardoc-g247

Subject Head of line blockingLevel Senior–Staff~35 minCommon in Networking & APIs interviewsIndustries Technology, Software development

Question

After migrating your service-to-service calls from HTTP/1.1 (with a connection pool) to a single multiplexed HTTP/2 connection per upstream 'for efficiency,' your p99 to a critical upstream got worse, not better, under load — and the badness correlates with one specific slow endpoint on that upstream. Fast endpoints on the same upstream now also show elevated p99 whenever the slow endpoint is being hit hard. TCP retransmits are slightly up on the path. There was no change to the upstream itself. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.