On-callMediumoc-g688

Subject Load balancer tcpLevel Mid–Senior~30 minCommon in Networking & APIs interviewsIndustries Technology

Question

Every rolling deploy of a service causes a brief but sharp burst of client-facing 5xx errors and 'connection reset' / 'upstream closed connection' errors at the load balancer — lasting 10-20 seconds per batch of pods replaced — even though the new pods are healthy and the steady state is clean. The errors line up exactly with old pods terminating. The deployment replaces pods in batches; the LB health check passes for new pods quickly. The app receives SIGTERM and exits in ~1s. There's no connection draining configured and no preStop hook. Triage and explain the durable fix.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.