Question
A high-throughput aggregation service that fans out to a single internal pricing API starts intermittently failing outbound connects at peak with EADDRNOTAVAIL ('cannot assign requested address'), even though the box has plenty of CPU, memory, and free fds. `ss -s` shows ~28,000 sockets in TIME_WAIT, almost all to the *same* destination IP:port. The host's ephemeral range is the default 32768–60999. Nothing changed in infra, but a release three days ago switched the pricing client from a pooled keep-alive HTTP/1.1 client to a 'simpler' library that opens a fresh connection per call and closes it. Traffic to the box is up only ~10% week over week. Walk through triage and mitigation.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.