Code Room
On-callHardoc-g192
Subject Resource exhaustionLevel Senior–Staff~40 minCommon in Concurrency interviewsIndustries Software development

Question

A CI build host starts failing new builds with 'fork: retry: Resource temporarily unavailable' and SSH logins are refused, though CPU, memory, and disk all look fine. The host runs many short-lived containerized build steps. `ps -eLf | wc -l` shows ~30,000 tasks; `cat /proc/sys/kernel/pid_max` is 32768. A build job introduced last week spawns a per-test helper process but, on a code path hit by a flaky test, never reaps the children — `ps` shows thousands of <defunct> (zombie) processes parented to that job. Triage and fix.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.