Snapshot bloat and disk pressure

Snapshots are cheap to take but they keep old blocks alive — take enough and never prune, and they quietly eat the disk until writes start failing.

The idea

A copy-on-write snapshot captures a point-in-time view almost for free: it shares every block with the live volume and only diverges as the live data changes. The catch is that each retained snapshot pins the old version of every block the live volume later rewrites, so those old blocks can never be freed.

Take snapshots frequently and never prune them, and disk usage grows with churn × retention — not with the size of the live data, which can sit flat. When free space crosses a threshold the system hits disk pressure: writes slow, then fail, the database may flip to read-only, and the node may be evicted. The fix is a retention policy: prune or merge old snapshots, and alert on free space and snapshot count, not just live-data size.

See it work

Press play to watch it run.

How it works

A copy-on-write write never overwrites a block that a snapshot still references. Instead it copies the old block aside (the snapshot keeps pointing at the copy) and writes the new data to a fresh block. A retention loop reclaims space only by dropping snapshots old enough that no live snapshot still references their pinned blocks.

// copy-on-write: never clobber a block a snapshot still needs
function cow_write(volume, block_id, new_data):
    if any_snapshot_references(block_id):       // shared with a snapshot
        old = volume.blocks[block_id]
        copy = allocate_new_block()             // pins old version on disk
        copy.data = old.data
        for snap in snapshots_referencing(block_id):
            snap.remap(block_id -> copy)        // snapshot keeps the old view
    volume.blocks[block_id] = write(new_data)   // live volume diverges

// retention: prune old snapshots, free only un-shared blocks
function prune(snapshots, max_age_days):
    for snap in snapshots:
        if snap.age > max_age_days:
            drop(snap)
    for blk in pinned_blocks():
        if not any_snapshot_references(blk):    // nothing left needs it
            free(blk)                           // space finally returns

Trade-offs

Aspect	Cost	Signal to watch
Snapshot creation	O(1) — just a new reference, no data copied	Snapshot count climbing without bound
Space	Grows with churn × retention, not live-data size	Used space far above live-data size
Performance	Copy-on-write write amplification and fragmentation	Write latency creeping up over time
Deletion	Pruning frees only blocks no remaining snapshot shares	Reclaimed space far below the snapshot’s logical size
Recovery time	Point-in-time restore is fast, but each kept point costs space	Retention depth vs free-space headroom

Watch out for

Unbounded retention with no prune or merge policy — snapshots pile up forever and only ever add to used space.
High write churn multiplies the cost: every rewritten block under a snapshot pins an extra copy, so busy volumes bloat fastest.
Alerting on live-data size but not snapshot-pinned space — the dashboard looks flat while the disk silently fills.
Deleting a snapshot frees less than expected because its blocks are still shared with newer snapshots or the live volume.
Disk pressure cascades: writes fail, the database flips to read-only, and the node may be evicted; long snapshot chains also slow reads.

Worked example

Take a 100 GB volume with 5% daily churn and an hourly snapshot kept for 30 days. Each day rewrites about 5 GB of blocks; under a snapshot, every rewrite pins the old version, so roughly 5 GB of new pinned space accrues per day on top of the steady 100 GB of live data. Over the 30-day retention window that is about 30 × 5 = 150 GB of snapshot-pinned blocks — about 250 GB used total while the live data never leaves 100 GB.

On a 256 GB volume that crosses the warning watermark within roughly two weeks and fills near day 30, which is exactly the curve in the animation: the live-data band stays flat while the snapshot band climbs tick by tick, tips the bar from healthy green to warning warm and into disk pressure — until a prune drops the oldest snapshots and their uniquely-pinned blocks finally free.

Check yourself

Live data is steady at 100 GB, but the disk keeps filling toward full. What is the most likely cause?

You delete the oldest snapshot to recover space, but barely any frees. Why?