Keep a hot copy of slow data nearby, so most reads never have to touch the database.
A database read is slow — disk seeks, network hops, query parsing. If the same few keys get read over and over, paying that cost every single time is wasteful. So we put a small, fast store (memory) in front of the database.
On every read we check the cache first. A hit returns instantly and never wakes the database. A miss takes the slow trip to the database, then fills the cache on the way back — so the next read of that key is a hit. As popular keys warm up, the slow path fires less and less, and average latency falls toward the cache's speed.
In a read-through cache the cache sits inline: the application asks the cache for a key, and the cache library (or loader) is responsible for fetching from the database on a miss and storing the result. The app never talks to the database directly — it just calls cache.get(key).
Contrast that with cache-aside (the more common DIY pattern): the application checks the cache itself, and on a miss it queries the database and writes the value back into the cache by hand. Same hit/miss/fill cycle — the difference is just who owns the fill logic.
# read-through get(key): the cache owns the miss path
def get(key):
value = cache.get(key)
if value is not None: # HIT — fast, DB untouched
return value
value = db.get(key) # MISS — slow trip to the database
cache.set(key, value, ttl) # fill the cache for next time
return value
Writes need a plan too. Write-through updates the cache and the database together on every write, so the cache is never stale (but writes pay the database cost). Write-back / write-behind updates the cache immediately and flushes to the database asynchronously — fast writes, but you risk losing buffered data on a crash. A TTL on each entry caps how stale a value can get: after it expires, the next read misses and re-fetches fresh data.
| Signal | Value | What it tells you |
|---|---|---|
| Hit latency | ~1ms | Served from memory; the database is never touched. |
| Miss latency | ~51ms | Cache probe + the slow DB fetch (~1 + ~50) + a tiny fill. |
| Hit rate | hits / reads | The key health signal. Higher means more reads stay fast. |
| Effective avg latency | h·1 + (1−h)·50 | Weighted by hit rate h; drops as h climbs. |
| Cold start | low h at first | An empty cache misses on every new key until it warms up. |
With a 1ms hit and a 50ms miss, a 90% hit rate gives 0.9·1 + 0.1·50 = 5.9ms average — versus 50ms with no cache. The win is almost entirely about how often you hit.
TTL so it eventually expires, or explicit invalidation that deletes/updates the cache entry on every write.Start with an empty cache and run the read stream A B A C A. Hit = 1ms, miss = 50ms.
| Read | Result | Latency | Hit rate so far | Avg latency so far |
|---|---|---|---|---|
A | miss → fill A | 50ms | 0 / 1 = 0% | 50.0ms |
B | miss → fill B | 50ms | 0 / 2 = 0% | 50.0ms |
A | hit | 1ms | 1 / 3 = 33% | 33.7ms |
C | miss → fill C | 50ms | 1 / 4 = 25% | 37.8ms |
A | hit | 1ms | 2 / 5 = 40% | 30.4ms |
Three misses and two hits. Total time = 50+50+1+50+1 = 152ms, so the average is 152 / 5 = 30.4ms. Notice the average falling each time A repeats — every hit is a 50ms trip we got to skip. Keep replaying A and the average keeps sliding toward 1ms.
Out of 5 reads you got 3 hits and 2 misses (hit = 1ms, miss = 50ms). Roughly what's the average latency per read?
A single very hot key expires and, in the same instant, thousands of in-flight reads all miss. What's the failure mode, and the fix?