Code Room
On-callMedium
Question
A self-hosted CI runner host starts failing jobs at 09:00 with 'no space left on device' during `docker build` and `git checkout`, but `df -h` on /var/lib/docker shows only 40% bytes used. `df -i` shows that filesystem at 100% inodes. The host runs hundreds of short-lived containers a day; image pulls and intermediate build layers (each a tree of many small files in overlay2) have accumulated, and `docker image prune`/container cleanup has never been scheduled. No deploy — just steadily more pipelines over months. Triage and recover.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.