Throttling and 429s on object storage

When you hammer one prefix too fast, the store says slow down — and the fix is to back off, not retry harder.

The idea

An object store (an S3-style blob store) accepts a fixed budget of requests per partition per moment — say five per tick. When a burst of nine requests slams the same prefix at once, the first few fit the budget and return 200 OK; the rest overflow and come back as 429 Too Many Requests ("Slow Down").

A 429 is not a failure to give up on — it is the store asking for a pause. The right move is exponential backoff with jitter: wait a doubling delay plus a small random nudge, then retry. That spreads the leftover load across later ticks until the burst drains, with zero requests ultimately dropped.

See it work

Press play, or step through it.

How it works

The client wraps every request in a retry loop. On a 429 it sleeps for a delay that doubles each attempt (base · 2^attempt), capped, plus a small random jitter so that many clients don't all retry on the same beat and re-collide. If the store sends a Retry-After header, that wins. Attempts are bounded so a truly stuck request fails cleanly instead of looping forever.

import random, time

def put_with_backoff(client, key, body, base=0.1, cap=10.0, max_attempts=6):
    for attempt in range(max_attempts):
        resp = client.put(key, body)          # try the write
        if resp.status != 429:
            return resp                        # 200 (or a real failure) -> done

        # 429 Too Many Requests: back off, then retry.
        retry_after = resp.headers.get("Retry-After")
        if retry_after is not None:
            sleep = float(retry_after)         # honor the server's hint
        else:
            backoff = min(cap, base * (2 ** attempt))    # 0.1, 0.2, 0.4, ...
            sleep = backoff + random.uniform(0, backoff)  # full jitter
        time.sleep(sleep)

    raise RuntimeError("still throttled after %d attempts" % max_attempts)

Cost and trade-offs

StrategyWhat happensWhen to use
Retry immediately The rejected requests slam back instantly, amplifying the burst — a thundering herd that keeps the prefix over budget. Never for a busy prefix.
Exponential backoff + jitter Retries spread across doubling, de-synchronized delays; the burst drains into the rate budget. Slightly higher tail latency. The default for any throttled client.
Spread writes across prefixes Keys hash to many partitions, so no single prefix is hot. Avoids throttling at the source. High sustained write volume; design-time fix.
Honor Retry-After Client waits exactly as long as the store asks, neither too eager nor too patient. Whenever the header is present.

Watch out for

Worked example

A batch job fires 9 writes at one prefix in a single tick. The budget is 5 per tick. Requests 1–5 return 200 OK; 6–9 come back 429. The client doesn't quit — it schedules retries with backoff 100 ms, 200 ms, 400 ms, each plus a little jitter so the four don't retry on the same instant.

Tick 2 has spare budget, so two of the retried writes land; tick 3 takes the last two. Total wall time is roughly the longest backoff plus a couple of ticks — under a second — and 0 requests are ultimately dropped. Backoff turned a spike that exceeded the budget into a smooth flow that fit inside it.

Check yourself

Your nightly batch job gets a 429 Too Many Requests from the object store while writing to one prefix. What is the best response?