Question
An image-resizing service that caches every derived thumbnail as a separate file under /var/cache/thumbs starts failing with ENOSPC on writes at 09:30, yet `df -h` on that volume shows only 60% bytes used. Resized images are tiny (a few KB) and the cache has accumulated tens of millions of them across a flat directory structure since the cache-cleanup cron was accidentally disabled in a refactor three weeks ago. A campaign this morning drove a surge of new unique image-variant requests. Beyond the write failures, directory listings and lookups on that path have become very slow. Triage and remediate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.