Push notification gateway

One logical "send" becomes thousands of tiny deliveries — the gateway is the part that turns a message and a token list into the right call to the right platform.

The idea

Your app server wants to notify a crowd of users, but it does not talk to phones directly. It hands a message plus a list of device tokens to a gateway, which fans the send out to the right platform push service — APNs for Apple devices, FCM for Android — over long-lived connections.

The gateway's job is everything between "send this" and "the phone buzzed": route each token to its platform, batch the calls, read each per-device result, retry the transient failures, and prune the tokens the platform says are dead.

Fan-out 8 devices

Press play. One message enters the gateway, is routed per platform, and fans out to every device token.

How it works

A single send carries one payload and many tokens. The gateway groups tokens by platform — Apple tokens go out over a persistent APNs connection, Android tokens over an FCM connection — and sends them in batches so it is not paying a fresh handshake per device. Keeping those connections alive (HTTP/2 streams, kept warm) is what makes the fan-out fast.

Every device comes back with its own result. A 200 means delivered. A transient failure (timeout, 429, 503) means try again later with backoff. A permanent 410 Unregistered means the app was uninstalled — that token is dead, so you drop it and prune it from the store so you never send to it again.

def send(message, tokens):
    for batch in by_platform(tokens):        # route + batch per platform
        conn = pool.connection(batch.platform)   # long-lived, kept warm
        for tok in batch:
            res = conn.deliver(message, tok)
            if res.ok:                       # 200 — delivered
                continue
            elif res.transient:              # timeout / 429 / 503
                retry_queue.push(tok, backoff(res.attempt))
            elif res.unregistered:           # 410 — app gone for good
                token_store.remove(tok)      # prune; never send here again

The retry queue drains on a backoff schedule (each attempt waits longer), and it gives up after a few tries so a genuinely broken endpoint can't be retried forever. Pruning and retrying are the two halves of keeping the token list honest.

Signals & trade-offs

Lever	Effect	Watch
Bigger batches	Fewer round-trips, higher throughput	Bigger blast radius if one batch fails
Aggressive retries	Better delivery on flaky networks	Can hit provider rate limits and amplify load
Keep-alive connections	Low per-send latency, no handshake tax	Resource cost — open streams, memory, pool tuning
Eager token pruning	Less wasted send, cleaner rate budget	Prune only on permanent errors, never transient

Watch out for

Retrying permanent failures forever. A 410 Unregistered will never succeed. Retrying it just burns the queue and your rate budget — drop it on the first permanent response.
Not pruning dead tokens. Every uninstalled device you keep sending to is a wasted call that still counts against provider quotas, slowly starving real deliveries.
Ignoring provider rate limits. APNs and FCM push back when you exceed them. Without throttling, a big fan-out gets 429s and your overall delivery rate drops.
No idempotency on retries. If a "transient failure" actually delivered, a blind retry sends the notification twice. Tag sends with a collapse or dedupe key so a phone buzzes once.
Fan-out with no backpressure. Dumping a million tokens into the gateway at once overwhelms it. Meter the fan-out into the connection pool so the gateway sheds, not crashes.

Worked example

A campaign sends one message to 50,000 users. The gateway splits the tokens — say 30,000 Apple and 20,000 Android — and pushes each set in batches over warm APNs and FCM connections. Most return 200 and are marked delivered. A few thousand hit a transient 429 during a traffic spike; those go to a retry queue and drain with exponential backoff, and nearly all land on the second attempt. About 1,200 come back 410 Unregistered — those users uninstalled — so the gateway removes those tokens from the store. Next campaign, that wasted send and its rate-limit pressure are simply gone.

Check yourself

APNs returns 410 Unregistered for a device token. What should the gateway do?