Question
A Node.js metrics-ingestion service climbs toward its fd limit over ~12 hours and then starts emitting EMFILE ('too many open files'), at which point it can neither accept connections nor open new files. The open-fd graph rises with the *rate of config-reload events*, not with request traffic — it jumps each time an upstream config changes. `lsof` shows a growing number of anon_inode `[eventpoll]` and inotify fds. A change last sprint added per-config-change logic that creates a new `fs.watch`/file watcher (and a timer) on each reload but never closes the previous one. CPU/mem are fine. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.