One logical "send" becomes thousands of tiny deliveries — the gateway is the part that turns a message and a token list into the right call to the right platform.
Your app server wants to notify a crowd of users, but it does not talk to phones directly. It hands a message plus a list of device tokens to a gateway, which fans the send out to the right platform push service — APNs for Apple devices, FCM for Android — over long-lived connections.
The gateway's job is everything between "send this" and "the phone buzzed": route each token to its platform, batch the calls, read each per-device result, retry the transient failures, and prune the tokens the platform says are dead.
A single send carries one payload and many tokens. The gateway groups tokens by platform — Apple tokens go out over a persistent APNs connection, Android tokens over an FCM connection — and sends them in batches so it is not paying a fresh handshake per device. Keeping those connections alive (HTTP/2 streams, kept warm) is what makes the fan-out fast.
Every device comes back with its own result. A 200 means delivered. A transient failure (timeout, 429, 503) means try again later with backoff. A permanent 410 Unregistered means the app was uninstalled — that token is dead, so you drop it and prune it from the store so you never send to it again.
def send(message, tokens):
for batch in by_platform(tokens): # route + batch per platform
conn = pool.connection(batch.platform) # long-lived, kept warm
for tok in batch:
res = conn.deliver(message, tok)
if res.ok: # 200 — delivered
continue
elif res.transient: # timeout / 429 / 503
retry_queue.push(tok, backoff(res.attempt))
elif res.unregistered: # 410 — app gone for good
token_store.remove(tok) # prune; never send here again
The retry queue drains on a backoff schedule (each attempt waits longer), and it gives up after a few tries so a genuinely broken endpoint can't be retried forever. Pruning and retrying are the two halves of keeping the token list honest.
| Lever | Effect | Watch |
|---|---|---|
| Bigger batches | Fewer round-trips, higher throughput | Bigger blast radius if one batch fails |
| Aggressive retries | Better delivery on flaky networks | Can hit provider rate limits and amplify load |
| Keep-alive connections | Low per-send latency, no handshake tax | Resource cost — open streams, memory, pool tuning |
| Eager token pruning | Less wasted send, cleaner rate budget | Prune only on permanent errors, never transient |
410 Unregistered will never succeed. Retrying it just burns the queue and your rate budget — drop it on the first permanent response.429s and your overall delivery rate drops.A campaign sends one message to 50,000 users. The gateway splits the tokens — say 30,000 Apple and 20,000 Android — and pushes each set in batches over warm APNs and FCM connections. Most return 200 and are marked delivered. A few thousand hit a transient 429 during a traffic spike; those go to a retry queue and drain with exponential backoff, and nearly all land on the second attempt. About 1,200 come back 410 Unregistered — those users uninstalled — so the gateway removes those tokens from the store. Next campaign, that wasted send and its rate-limit pressure are simply gone.
APNs returns 410 Unregistered for a device token. What should the gateway do?