Every read and write travels down a ladder of layers — each one slower but more durable than the last. The whole craft of storage is keeping the data you reach for most in the fast layers near the top.
Think of how you keep things at home. The notebook open on your desk is grabbed in a heartbeat. The drawer beside you takes a moment. The filing cabinet in the closet takes a short walk. The safe-deposit box at the bank takes a whole trip — but it's the one place nothing ever gets lost.
A storage system is built the same way, as a stack of four layers: client → cache → storage engine → disk. Each step down is slower but larger and more durable. Reads try the cache first — a hit returns in nanoseconds; a miss falls all the way through to disk and pays milliseconds. Writes go through the engine, land in a durable journal so a crash can't lose them, then refresh the cache.
Because a miss can be a thousand times slower than a hit, the entire game is keeping the hot working set in the layers near the top.
A read tries the cache, and only pays for the slow layers on a miss — then it populates the cache so the next read is a hit. A write logs to a durable journal first so a crash can never lose it, then updates the engine and refreshes the cache.
def read(key):
val = cache.get(key)
if val is not None:
return val # cache hit (~100 ns)
val = engine.lookup(key) # miss -> storage engine + disk (~ms)
cache.put(key, val) # populate so next read is a hit
return val
def write(key, value):
wal.append(key, value) # durable journal first (crash-safe)
engine.apply(key, value) # update the index / pages
cache.put(key, value) # keep cache coherent
return Ack()
Latency grows by orders of magnitude as you descend. Faster layers are small and volatile (they vanish on a restart); slower layers are large and durable.
| Layer | Typical latency | Size | Survives a crash? |
|---|---|---|---|
| Cache (memory / Redis) | ~100 ns | GBs | No — volatile |
| Storage engine (buffer pool) | ~1 µs | GBs | No — volatile |
| SSD | ~100 µs | TBs | Yes — durable |
| HDD | ~5–10 ms | TBs | Yes — durable |
A read served from cache is roughly 50,000× faster than one that falls through to an HDD. That gap is why hit rate — not raw disk speed — usually decides your p99.
A profile page asks for user 42. The very first load misses the cache and pays a ~5 ms disk read; the value is copied into the cache on the way back up.
The next 10,000 loads of that same profile are cache hits at ~100 ns each — effectively free. One slow read bought tens of thousands of fast ones.
Then the user edits their bio. The write appends to the WAL first (so a crash mid-write can't lose it), updates the engine, and refreshes the cache with the new value — so the very next read is both fresh and fast. That single discipline — journal, then apply, then cache — is what keeps the layers honest.
A cache miss on this stack falls through to disk and pays ~5 ms, while a hit returns in ~100 ns. Your service runs at a 90% hit rate, and someone proposes buying faster disks to cut tail latency. What actually moves p99 the most?