On-callHardoc-g412

Subject Fd exhaustionLevel Senior–Staff~35 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

A Node.js metrics-ingestion service climbs toward its fd limit over ~12 hours and then starts emitting EMFILE ('too many open files'), at which point it can neither accept connections nor open new files. The open-fd graph rises with the *rate of config-reload events*, not with request traffic — it jumps each time an upstream config changes. `lsof` shows a growing number of anon_inode `[eventpoll]` and inotify fds. A change last sprint added per-config-change logic that creates a new `fs.watch`/file watcher (and a timer) on each reload but never closes the previous one. CPU/mem are fine. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.