When you hammer one prefix too fast, the store says slow down — and the fix is to back off, not retry harder.
An object store (an S3-style blob store) accepts a fixed budget of requests per partition per moment — say five per tick. When a burst of nine requests slams the same prefix at once, the first few fit the budget and return 200 OK; the rest overflow and come back as 429 Too Many Requests ("Slow Down").
A 429 is not a failure to give up on — it is the store asking for a pause. The right move is exponential backoff with jitter: wait a doubling delay plus a small random nudge, then retry. That spreads the leftover load across later ticks until the burst drains, with zero requests ultimately dropped.
The client wraps every request in a retry loop. On a 429 it sleeps for a delay that doubles each attempt (base · 2^attempt), capped, plus a small random jitter so that many clients don't all retry on the same beat and re-collide. If the store sends a Retry-After header, that wins. Attempts are bounded so a truly stuck request fails cleanly instead of looping forever.
import random, time
def put_with_backoff(client, key, body, base=0.1, cap=10.0, max_attempts=6):
for attempt in range(max_attempts):
resp = client.put(key, body) # try the write
if resp.status != 429:
return resp # 200 (or a real failure) -> done
# 429 Too Many Requests: back off, then retry.
retry_after = resp.headers.get("Retry-After")
if retry_after is not None:
sleep = float(retry_after) # honor the server's hint
else:
backoff = min(cap, base * (2 ** attempt)) # 0.1, 0.2, 0.4, ...
sleep = backoff + random.uniform(0, backoff) # full jitter
time.sleep(sleep)
raise RuntimeError("still throttled after %d attempts" % max_attempts)
| Strategy | What happens | When to use |
|---|---|---|
| Retry immediately | The rejected requests slam back instantly, amplifying the burst — a thundering herd that keeps the prefix over budget. | Never for a busy prefix. |
| Exponential backoff + jitter | Retries spread across doubling, de-synchronized delays; the burst drains into the rate budget. Slightly higher tail latency. | The default for any throttled client. |
| Spread writes across prefixes | Keys hash to many partitions, so no single prefix is hot. Avoids throttling at the source. | High sustained write volume; design-time fix. |
Honor Retry-After |
Client waits exactly as long as the store asks, neither too eager nor too patient. | Whenever the header is present. |
429 means slow down, not "broken." Failing the whole batch job on it throws away work the store was willing to take a moment later.logs/2026-06-26/... funnel a whole day's traffic onto one partition. Spread the entropy earlier in the key.A batch job fires 9 writes at one prefix in a single tick. The budget is 5 per tick. Requests 1–5 return 200 OK; 6–9 come back 429. The client doesn't quit — it schedules retries with backoff 100 ms, 200 ms, 400 ms, each plus a little jitter so the four don't retry on the same instant.
Tick 2 has spare budget, so two of the retried writes land; tick 3 takes the last two. Total wall time is roughly the longest backoff plus a couple of ticks — under a second — and 0 requests are ultimately dropped. Backoff turned a spike that exceeded the budget into a smooth flow that fit inside it.
Your nightly batch job gets a 429 Too Many Requests from the object store while writing to one prefix. What is the best response?