Keep a copy of the answer near the person asking. The first visitor pays the long trip; everyone after them gets it from next door.
A content delivery network is a tiered hierarchy of caches spread across the globe. A user's request lands at the nearest edge PoP (point of presence). If that edge already holds the object — a cache hit — it's served immediately, just a few milliseconds away.
On a miss, the edge can't answer alone. It walks up the hierarchy — to a regional parent cache, and if needed to the origin — fetches the object, stores a copy at each tier per its TTL, and serves it. The next request for that same object is a hit. That single trick — fill once, serve many — is what cuts latency and offloads the origin.
Three tiers, closest to farthest. The edge PoP sits nearest the user — many of them, worldwide. Behind a group of edges is a regional (parent) cache, larger and shared. Behind everything sits the origin — the one source of truth, far away and expensive to reach.
Every cached copy carries a TTL (time to live). While it's fresh, a hit serves it instantly. Once it expires, the tier treats the object as a miss and refills from a parent. The fraction of requests answered without touching the origin is the hit ratio — the number you optimise.
def edge_lookup(key):
obj = cache.get(key)
if obj and not obj.expired(): # cache HIT — fresh copy
return obj # served at the edge, low latency
# MISS — climb the hierarchy
obj = parent_or_origin.fetch(key) # regional cache, then origin
cache.store(key, obj, ttl=obj.ttl) # fill this tier for next time
return obj
The win compounds: a miss fills the edge and every parent on the path, so the next user near a different edge in the same region benefits too. Misses are the cost; hits are the payoff.
| Lever | Effect | Watch |
|---|---|---|
| Long TTL | High hit ratio, origin barely touched | Stale content lingers past a change |
| Short TTL | Content stays fresh | More misses, more origin load |
| More PoPs | Lower latency, closer to users | Higher cost, harder consistency |
| Regional tier | Absorbs edge misses, shields origin | Extra hop on a cold object |
no-store or key them per user; cache only what's truly shared.Vary. If the key ignores query params, language, or encoding, different content collides on one entry. Get the cache key right or you serve the wrong bytes.A news site publishes a breaking story image. The first reader in Tokyo hits the Tokyo edge — a miss. The edge asks the Asia-Pacific regional cache (also empty), which fetches from the origin in Virginia: roughly 130 ms round trip. The image is stored at the regional cache and the Tokyo edge with a 5-minute TTL. The next thousand Tokyo readers each hit the edge at ~8 ms, and the origin never hears from them. A reader in Singapore hits a different edge — a miss there, but the Asia-Pacific regional cache already holds it, so it fills in ~25 ms without troubling Virginia. Five minutes later the TTL lapses; the next request revalidates and the cycle resets.
A brand-new viral object is cold at the edge and ten thousand users request it in the same second. What protects the origin?