Content delivery networks

Keep a copy of the answer near the person asking. The first visitor pays the long trip; everyone after them gets it from next door.

The idea

A content delivery network is a tiered hierarchy of caches spread across the globe. A user's request lands at the nearest edge PoP (point of presence). If that edge already holds the object — a cache hit — it's served immediately, just a few milliseconds away.

On a miss, the edge can't answer alone. It walks up the hierarchy — to a regional parent cache, and if needed to the origin — fetches the object, stores a copy at each tier per its TTL, and serves it. The next request for that same object is a hit. That single trick — fill once, serve many — is what cuts latency and offloads the origin.

Request object

Pick an object and press play. A cold object misses to origin and fills the tiers; the second request hits the edge.

How it works

Three tiers, closest to farthest. The edge PoP sits nearest the user — many of them, worldwide. Behind a group of edges is a regional (parent) cache, larger and shared. Behind everything sits the origin — the one source of truth, far away and expensive to reach.

Every cached copy carries a TTL (time to live). While it's fresh, a hit serves it instantly. Once it expires, the tier treats the object as a miss and refills from a parent. The fraction of requests answered without touching the origin is the hit ratio — the number you optimise.

def edge_lookup(key):
    obj = cache.get(key)
    if obj and not obj.expired():      # cache HIT — fresh copy
        return obj                     # served at the edge, low latency

    # MISS — climb the hierarchy
    obj = parent_or_origin.fetch(key)  # regional cache, then origin
    cache.store(key, obj, ttl=obj.ttl) # fill this tier for next time
    return obj

The win compounds: a miss fills the edge and every parent on the path, so the next user near a different edge in the same region benefits too. Misses are the cost; hits are the payoff.

Signals & trade-offs

Lever	Effect	Watch
Long TTL	High hit ratio, origin barely touched	Stale content lingers past a change
Short TTL	Content stays fresh	More misses, more origin load
More PoPs	Lower latency, closer to users	Higher cost, harder consistency
Regional tier	Absorbs edge misses, shields origin	Extra hop on a cold object

Watch out for

Cache miss storm (thundering herd). When a popular object is cold or just expired, thousands of users miss at once and stampede the origin. Coalesce concurrent misses — one fetch, the rest wait on it — or use stale-while-revalidate.
Caching personalized or authenticated responses. A page baked for one logged-in user gets served to the next. Mark private responses no-store or key them per user; cache only what's truly shared.
Sloppy cache keys and Vary. If the key ignores query params, language, or encoding, different content collides on one entry. Get the cache key right or you serve the wrong bytes.
TTL too long. A 24-hour TTL on a file you just changed means stale content for everyone until it expires. Use versioned URLs or purge on deploy.
No request coalescing. Without it, a single cold object turns N user requests into N origin fetches. Collapse them into one in-flight fill.

Worked example

A news site publishes a breaking story image. The first reader in Tokyo hits the Tokyo edge — a miss. The edge asks the Asia-Pacific regional cache (also empty), which fetches from the origin in Virginia: roughly 130 ms round trip. The image is stored at the regional cache and the Tokyo edge with a 5-minute TTL. The next thousand Tokyo readers each hit the edge at ~8 ms, and the origin never hears from them. A reader in Singapore hits a different edge — a miss there, but the Asia-Pacific regional cache already holds it, so it fills in ~25 ms without troubling Virginia. Five minutes later the TTL lapses; the next request revalidates and the cycle resets.

Check yourself

A brand-new viral object is cold at the edge and ten thousand users request it in the same second. What protects the origin?