Question
Your database runs on a copy-on-write filesystem/volume (ZFS/LVM-thin/cloud snapshots) and takes hourly snapshots for point-in-time recovery, retained 30 days. On-call is paged: the data volume is filling fast and projected to hit 100% in ~5 hours, threatening writes. Dashboards: actual live dataset size is roughly flat at ~2 TB, but total volume usage climbed from 4 TB to 9 TB over the past week; snapshot count is normal (720), but per-snapshot 'unique referenced' bytes are huge and growing; a data-retention/GC job that rewrites or deletes large swaths of old rows ran heavily this week; nothing was actually deleted from disk. How do you triage snapshot bloat / space amplification that's about to fill the volume?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.