Code Room
On-callEasy
Question
A single application server starts throwing 'No space left on device' errors and write operations fail. The host metrics show the root filesystem at 100% usage, climbing steadily over the last few days. The service has been deployed unchanged for two weeks and traffic is normal. The other servers in the pool are healthy and at 60% disk. What do you check first, and how do you get this host healthy?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.