Content delivery networks

Keep a copy of the answer near the person asking. The first visitor pays the long trip; everyone after them gets it from next door.

The idea

A content delivery network is a tiered hierarchy of caches spread across the globe. A user's request lands at the nearest edge PoP (point of presence). If that edge already holds the object — a cache hit — it's served immediately, just a few milliseconds away.

On a miss, the edge can't answer alone. It walks up the hierarchy — to a regional parent cache, and if needed to the origin — fetches the object, stores a copy at each tier per its TTL, and serves it. The next request for that same object is a hit. That single trick — fill once, serve many — is what cuts latency and offloads the origin.

Pick an object and press play. A cold object misses to origin and fills the tiers; the second request hits the edge.

How it works

Three tiers, closest to farthest. The edge PoP sits nearest the user — many of them, worldwide. Behind a group of edges is a regional (parent) cache, larger and shared. Behind everything sits the origin — the one source of truth, far away and expensive to reach.

Every cached copy carries a TTL (time to live). While it's fresh, a hit serves it instantly. Once it expires, the tier treats the object as a miss and refills from a parent. The fraction of requests answered without touching the origin is the hit ratio — the number you optimise.

def edge_lookup(key):
    obj = cache.get(key)
    if obj and not obj.expired():      # cache HIT — fresh copy
        return obj                     # served at the edge, low latency

    # MISS — climb the hierarchy
    obj = parent_or_origin.fetch(key)  # regional cache, then origin
    cache.store(key, obj, ttl=obj.ttl) # fill this tier for next time
    return obj

The win compounds: a miss fills the edge and every parent on the path, so the next user near a different edge in the same region benefits too. Misses are the cost; hits are the payoff.

Signals & trade-offs

LeverEffectWatch
Long TTLHigh hit ratio, origin barely touchedStale content lingers past a change
Short TTLContent stays freshMore misses, more origin load
More PoPsLower latency, closer to usersHigher cost, harder consistency
Regional tierAbsorbs edge misses, shields originExtra hop on a cold object

Watch out for

Worked example

A news site publishes a breaking story image. The first reader in Tokyo hits the Tokyo edge — a miss. The edge asks the Asia-Pacific regional cache (also empty), which fetches from the origin in Virginia: roughly 130 ms round trip. The image is stored at the regional cache and the Tokyo edge with a 5-minute TTL. The next thousand Tokyo readers each hit the edge at ~8 ms, and the origin never hears from them. A reader in Singapore hits a different edge — a miss there, but the Asia-Pacific regional cache already holds it, so it fills in ~25 ms without troubling Virginia. Five minutes later the TTL lapses; the next request revalidates and the cycle resets.

Check yourself

A brand-new viral object is cold at the edge and ten thousand users request it in the same second. What protects the origin?