On-callMediumoc-g413

Subject Inode exhaustionLevel Mid–Senior~30 minCommon in Reliability & on-call interviewsIndustries Technology, IT services

Question

A self-hosted CI runner host starts failing jobs at 09:00 with 'no space left on device' during `docker build` and `git checkout`, but `df -h` on /var/lib/docker shows only 40% bytes used. `df -i` shows that filesystem at 100% inodes. The host runs hundreds of short-lived containers a day; image pulls and intermediate build layers (each a tree of many small files in overlay2) have accumulated, and `docker image prune`/container cleanup has never been scheduled. No deploy — just steadily more pipelines over months. Triage and recover.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.