Webhook ordering and retry protocol

Deliver every event at least once, let duplicates be harmless, and never let a later event for the same key slip past one that hasn’t landed yet.

The idea

A webhook sender pushes events to a consumer over HTTP. The network is flaky, so the sender retries until it sees a 2xx acknowledgement — that is at-least-once delivery, which means the consumer can see the same event more than once. To make duplicates harmless, every event carries an idempotency key: the consumer applies it once and ignores replays.

Order matters only within a key (one order, one account). The sender stamps each event with a per-key sequence number and refuses to deliver seq n+1 until seq n is acked — that is head-of-line blocking. Different keys are independent, so they flow in parallel. After a bounded number of retries a stuck event moves to a dead-letter queue rather than blocking forever.

See it work

Press play to watch four events flow from the sender to the consumer.

How it works

// SENDER — per consumer endpoint, one in-flight delivery per key
deliver(event):                       // event = {id, key, seq, payload}
    while seq(event) != next_expected[key]:   # head-of-line: wait turn
        hold(event)

    attempt = 0
    backoff = base                    # e.g. 1 second
    loop:
        resp = POST(endpoint, event,
                    headers={ "Idempotency-Key": event.id,
                              "X-Seq": event.seq, "X-Key": event.key })
        if resp.status in 2xx:        # ACK
            ack(event)
            next_expected[key] += 1   # release the next same-key event
            return DELIVERED
        attempt += 1
        if attempt > MAX_ATTEMPTS:
            dead_letter(event)        # park it; do NOT advance the key
            return DEAD_LETTERED      # later same-key events stay blocked
        sleep(backoff + random_jitter())   # exponential backoff + jitter
        backoff = min(backoff * 2, cap)

// CONSUMER — idempotent apply, dedup by idempotency key
on_event(event):
    if seen.contains(event.id):       # duplicate redelivery
        return 200                    # ack again, no side effect
    apply(event.payload)              # the real, ordered work
    seen.add(event.id)
    return 200

Cost

Mechanism	Guarantee	Cost
At-least-once + idempotency key	No event is silently lost; replays are safe	Consumer must store seen keys and dedup
Per-key ordering (seq + HOL block)	Same-key events apply in order	One stuck event stalls only its own key
Global ordering	Everything applies in one total order	One stuck event stalls all traffic — usually too costly
Exponential backoff + jitter	A struggling consumer gets breathing room; retries spread out	Higher tail latency for a flapping event
Dead-letter queue after max attempts	A poison event can’t block its key forever	That key’s later events need manual or replay recovery

Watch out for

Retrying same-key events in parallel — a retry of seq 1 and a fresh seq 2 race, and 2 can land first. Keep one in-flight delivery per key.
No idempotency key, so a redelivered event runs its side effect twice — a second charge, a duplicate email. At-least-once requires dedup.
Retrying immediately with no backoff hammers a consumer that is already failing, turning a blip into an outage. Always back off, and add jitter so retries don’t synchronise.
Assuming global ordering when the contract is only per-key. Two different keys may arrive in any relative order — design for it.
No dead-letter path, so one poison event retries forever and blocks every later event for its key. Cap attempts and park failures.

Worked example

An order service emits order.created (key A, seq 1) then order.updated (key A, seq 2) for the same order_id. If updated arrives at the consumer first, it tries to update a row that doesn’t exist yet — a lost or corrupted update.

So the sender holds order.updated until order.created is acked. Suppose the first POST of created times out. The sender does not fire updated in the meantime; it retries created after 1s, then 2s, then 4s (each plus a little jitter). When created finally returns 2xx, the consumer advances key A to seq 1 and updated is released and delivered — now applying to a row that exists. Meanwhile an unrelated payment.settled (key B, seq 1) was never blocked: different key, delivered in parallel. And if the network double-delivers created, the consumer sees the same idempotency key, returns 200, and skips the side effect.

Check yourself

Event A:1 is timing out and being retried. Event A:2 (same key) is ready to send. What should the sender do with A:2?

The consumer’s network duplicates a delivery, so it receives order.created twice with the same idempotency key. What makes this safe?