Throttling and 429s on object storage

When you hammer one prefix too fast, the store says slow down — and the fix is to back off, not retry harder.

The idea

An object store (an S3-style blob store) accepts a fixed budget of requests per partition per moment — say five per tick. When a burst of nine requests slams the same prefix at once, the first few fit the budget and return 200 OK; the rest overflow and come back as 429 Too Many Requests ("Slow Down").

A 429 is not a failure to give up on — it is the store asking for a pause. The right move is exponential backoff with jitter: wait a doubling delay plus a small random nudge, then retry. That spreads the leftover load across later ticks until the burst drains, with zero requests ultimately dropped.

See it work

Press play, or step through it.

How it works

The client wraps every request in a retry loop. On a 429 it sleeps for a delay that doubles each attempt (base · 2^attempt), capped, plus a small random jitter so that many clients don't all retry on the same beat and re-collide. If the store sends a Retry-After header, that wins. Attempts are bounded so a truly stuck request fails cleanly instead of looping forever.

import random, time

def put_with_backoff(client, key, body, base=0.1, cap=10.0, max_attempts=6):
    for attempt in range(max_attempts):
        resp = client.put(key, body)          # try the write
        if resp.status != 429:
            return resp                        # 200 (or a real failure) -> done

        # 429 Too Many Requests: back off, then retry.
        retry_after = resp.headers.get("Retry-After")
        if retry_after is not None:
            sleep = float(retry_after)         # honor the server's hint
        else:
            backoff = min(cap, base * (2 ** attempt))    # 0.1, 0.2, 0.4, ...
            sleep = backoff + random.uniform(0, backoff)  # full jitter
        time.sleep(sleep)

    raise RuntimeError("still throttled after %d attempts" % max_attempts)

Cost and trade-offs

Strategy	What happens	When to use
Retry immediately	The rejected requests slam back instantly, amplifying the burst — a thundering herd that keeps the prefix over budget.	Never for a busy prefix.
Exponential backoff + jitter	Retries spread across doubling, de-synchronized delays; the burst drains into the rate budget. Slightly higher tail latency.	The default for any throttled client.
Spread writes across prefixes	Keys hash to many partitions, so no single prefix is hot. Avoids throttling at the source.	High sustained write volume; design-time fix.
Honor `Retry-After`	Client waits exactly as long as the store asks, neither too eager nor too patient.	Whenever the header is present.

Watch out for

Retrying with no backoff: a tight retry loop re-sends the rejected requests immediately — you self-DDoS your own prefix and the burst never drains.
No jitter: many clients backing off by the exact same doubling schedule retry in lockstep and re-collide on the same tick; the random nudge is what de-synchronizes them.
Treating 429 like a 5xx: a 429 means slow down, not "broken." Failing the whole batch job on it throws away work the store was willing to take a moment later.
One hot prefix: date-prefixed keys like logs/2026-06-26/... funnel a whole day's traffic onto one partition. Spread the entropy earlier in the key.
Unbounded retries: with no attempt cap a genuinely stuck request loops forever; bound attempts and surface a clean failure.

Worked example

A batch job fires 9 writes at one prefix in a single tick. The budget is 5 per tick. Requests 1–5 return 200 OK; 6–9 come back 429. The client doesn't quit — it schedules retries with backoff 100 ms, 200 ms, 400 ms, each plus a little jitter so the four don't retry on the same instant.

Tick 2 has spare budget, so two of the retried writes land; tick 3 takes the last two. Total wall time is roughly the longest backoff plus a couple of ticks — under a second — and 0 requests are ultimately dropped. Backoff turned a spike that exceeded the budget into a smooth flow that fit inside it.

Check yourself

Your nightly batch job gets a 429 Too Many Requests from the object store while writing to one prefix. What is the best response?