Code Room
On-callMediumoc-g028
Subject SaturationLevel Mid–Senior~30 minCommon in Storage & CDN interviewsIndustries Technology, Software development

Question

A single-host analytics ingestion service (Java) suddenly slows to a crawl at 03:00. Throughput drops 80%, GC logs show frequent long pauses, and `free -m` shows almost no free memory and a large amount in 'buff/cache'. `iostat` shows the data disk at 100% util with high `await`. CPU is mostly in `iowait` and `%sy`. No deploy happened, but a config change yesterday set DEBUG-level logging on a hot path, and log files have grown to hundreds of GB. The host also runs the embedded storage engine on the same disk. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.