On-callHardoc-g398

Subject Ephemeral port exhaustionLevel Senior–Staff~35 minCommon in Networking & APIs interviewsIndustries Technology, Software development

Question

A high-throughput aggregation service that fans out to a single internal pricing API starts intermittently failing outbound connects at peak with EADDRNOTAVAIL ('cannot assign requested address'), even though the box has plenty of CPU, memory, and free fds. `ss -s` shows ~28,000 sockets in TIME_WAIT, almost all to the *same* destination IP:port. The host's ephemeral range is the default 32768–60999. Nothing changed in infra, but a release three days ago switched the pricing client from a pooled keep-alive HTTP/1.1 client to a 'simpler' library that opens a fresh connection per call and closes it. Traffic to the box is up only ~10% week over week. Walk through triage and mitigation.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.