Code Room
On-callMediumoc-g195
Subject Disk fullLevel Mid–Senior~30 minCommon in Storage & CDN interviewsIndustries Software development

Question

A long-running Docker host (used as a self-hosted runner) starts failing container starts with 'no space left on device' on /var/lib/docker, and `df -h` confirms that filesystem is full. The host has been up for months running thousands of CI jobs. `docker system df` shows large 'RECLAIMABLE' figures: dozens of GB in dangling images, stopped containers, anonymous volumes, and build cache. Image churn rose after the team switched to multi-stage builds last month. No application bug. Triage and remediate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.