The storage stack

Every read and write travels down a ladder of layers — each one slower but more durable than the last. The whole craft of storage is keeping the data you reach for most in the fast layers near the top.

The idea

Think of how you keep things at home. The notebook open on your desk is grabbed in a heartbeat. The drawer beside you takes a moment. The filing cabinet in the closet takes a short walk. The safe-deposit box at the bank takes a whole trip — but it's the one place nothing ever gets lost.

A storage system is built the same way, as a stack of four layers: clientcachestorage enginedisk. Each step down is slower but larger and more durable. Reads try the cache first — a hit returns in nanoseconds; a miss falls all the way through to disk and pays milliseconds. Writes go through the engine, land in a durable journal so a crash can't lose them, then refresh the cache.

Because a miss can be a thousand times slower than a hit, the entire game is keeping the hot working set in the layers near the top.

Client your app Cache ~100 ns Storage engine ~1 µs Disk · WAL ~5 ms
Press play, or step through, to watch a read miss, a read hit, and a write travel the stack.

How it works

A read tries the cache, and only pays for the slow layers on a miss — then it populates the cache so the next read is a hit. A write logs to a durable journal first so a crash can never lose it, then updates the engine and refreshes the cache.

def read(key):
    val = cache.get(key)
    if val is not None:
        return val                 # cache hit  (~100 ns)
    val = engine.lookup(key)       # miss -> storage engine + disk (~ms)
    cache.put(key, val)            # populate so next read is a hit
    return val

def write(key, value):
    wal.append(key, value)         # durable journal first (crash-safe)
    engine.apply(key, value)       # update the index / pages
    cache.put(key, value)          # keep cache coherent
    return Ack()

Cost

Latency grows by orders of magnitude as you descend. Faster layers are small and volatile (they vanish on a restart); slower layers are large and durable.

LayerTypical latencySizeSurvives a crash?
Cache (memory / Redis)~100 nsGBsNo — volatile
Storage engine (buffer pool)~1 µsGBsNo — volatile
SSD~100 µsTBsYes — durable
HDD~5–10 msTBsYes — durable

A read served from cache is roughly 50,000× faster than one that falls through to an HDD. That gap is why hit rate — not raw disk speed — usually decides your p99.

Watch out for

Worked example

A profile page asks for user 42. The very first load misses the cache and pays a ~5 ms disk read; the value is copied into the cache on the way back up.

The next 10,000 loads of that same profile are cache hits at ~100 ns each — effectively free. One slow read bought tens of thousands of fast ones.

Then the user edits their bio. The write appends to the WAL first (so a crash mid-write can't lose it), updates the engine, and refreshes the cache with the new value — so the very next read is both fresh and fast. That single discipline — journal, then apply, then cache — is what keeps the layers honest.

Check yourself

A cache miss on this stack falls through to disk and pays ~5 ms, while a hit returns in ~100 ns. Your service runs at a 90% hit rate, and someone proposes buying faster disks to cut tail latency. What actually moves p99 the most?