Push notification gateway

One logical "send" becomes thousands of tiny deliveries — the gateway is the part that turns a message and a token list into the right call to the right platform.

The idea

Your app server wants to notify a crowd of users, but it does not talk to phones directly. It hands a message plus a list of device tokens to a gateway, which fans the send out to the right platform push service — APNs for Apple devices, FCM for Android — over long-lived connections.

The gateway's job is everything between "send this" and "the phone buzzed": route each token to its platform, batch the calls, read each per-device result, retry the transient failures, and prune the tokens the platform says are dead.

Press play. One message enters the gateway, is routed per platform, and fans out to every device token.

How it works

A single send carries one payload and many tokens. The gateway groups tokens by platform — Apple tokens go out over a persistent APNs connection, Android tokens over an FCM connection — and sends them in batches so it is not paying a fresh handshake per device. Keeping those connections alive (HTTP/2 streams, kept warm) is what makes the fan-out fast.

Every device comes back with its own result. A 200 means delivered. A transient failure (timeout, 429, 503) means try again later with backoff. A permanent 410 Unregistered means the app was uninstalled — that token is dead, so you drop it and prune it from the store so you never send to it again.

def send(message, tokens):
    for batch in by_platform(tokens):        # route + batch per platform
        conn = pool.connection(batch.platform)   # long-lived, kept warm
        for tok in batch:
            res = conn.deliver(message, tok)
            if res.ok:                       # 200 — delivered
                continue
            elif res.transient:              # timeout / 429 / 503
                retry_queue.push(tok, backoff(res.attempt))
            elif res.unregistered:           # 410 — app gone for good
                token_store.remove(tok)      # prune; never send here again

The retry queue drains on a backoff schedule (each attempt waits longer), and it gives up after a few tries so a genuinely broken endpoint can't be retried forever. Pruning and retrying are the two halves of keeping the token list honest.

Signals & trade-offs

LeverEffectWatch
Bigger batchesFewer round-trips, higher throughputBigger blast radius if one batch fails
Aggressive retriesBetter delivery on flaky networksCan hit provider rate limits and amplify load
Keep-alive connectionsLow per-send latency, no handshake taxResource cost — open streams, memory, pool tuning
Eager token pruningLess wasted send, cleaner rate budgetPrune only on permanent errors, never transient

Watch out for

Worked example

A campaign sends one message to 50,000 users. The gateway splits the tokens — say 30,000 Apple and 20,000 Android — and pushes each set in batches over warm APNs and FCM connections. Most return 200 and are marked delivered. A few thousand hit a transient 429 during a traffic spike; those go to a retry queue and drain with exponential backoff, and nearly all land on the second attempt. About 1,200 come back 410 Unregistered — those users uninstalled — so the gateway removes those tokens from the store. Next campaign, that wasted send and its rate-limit pressure are simply gone.

Check yourself

APNs returns 410 Unregistered for a device token. What should the gateway do?