Code Room
On-callHardoc-g397
Subject Fd exhaustionLevel Senior–Staff~35 minCommon in Networking & APIs interviewsIndustries Technology, Software development

Question

A Go payments-orchestration service slowly climbs toward its 65,536 fd soft limit over ~30 hours and then starts failing outbound calls with 'dial tcp: too many open files', forcing an hourly restart that masked the issue until this week. The open-fd graph rises in a near-perfect straight line that is *independent of traffic* — it keeps climbing at the same slope even during the overnight trough. `lsof` shows thousands of sockets in CLOSE_WAIT to one internal dependency, and a heap profile shows a growing number of live `http.Transport` objects. The only recent change is a refactor two weeks ago that moved HTTP calls into a per-request helper. How do you triage this and stop the bleed?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.