You just saved a file, ask for it back, and the system says “never heard of it” — because your read landed on a copy that hasn’t caught up yet.
When you upload an object, the store writes it to one primary node and answers success right away. The new bytes are then copied to several replicas in the background, so the system stays fast and available.
But a later read can be routed to any replica by a load balancer. If it lands on one that hasn’t received the copy yet, you see stale data — or a 404 for an object you know you just wrote. After a short window the replicas converge and every read agrees. That gap is the read-after-write hazard.
The write path is synchronous only to the primary; replication is fired off afterward. The read path picks some replica, so until replication lands it may return the old version. The fix is to make at least the read you care about avoid lagging replicas — route it to the primary, or carry a version token and demand a consistent read.
def put(key, bytes):
primary.write(key, bytes) # durable on primary, return now
for r in replicas:
replication_queue.enqueue(r, key, bytes) # async, eventual
return Ack(version=primary.version(key))
def get(key, consistent=False):
if consistent:
return primary.read(key) # read-your-writes: skip replicas
node = load_balancer.pick(replicas)
return node.read(key) # may be stale until convergence
# read-your-own-write with a version token
ack = put("avatar.png", data)
obj = get("avatar.png", consistent=True) # or retry until version >= ack.version
| Dimension | Eventual (read replica) | Strong (read primary) |
|---|---|---|
| Read latency | Lower — nearest replica | Higher — one hot node |
| Staleness window | Milliseconds to seconds | None for that key |
| Read throughput | Scales with replica count | Bounded by the primary |
| Cost | Cheaper, fan-out reads | Pricier, less cacheable |
| App complexity | Must tolerate staleness / retry | Simpler mental model |
GET immediately after a PUT can return 404 — the object exists, but your read hit a replica that hasn’t received it.LIST right after a write may omit the new object, even though a direct GET on the primary would find it.404 or old body can outlive convergence by minutes.You PUT s3://bucket/avatar.png and get 200 OK. Your UI immediately GETs it to render the new avatar. The load balancer routes that read to replica B, which is still 80 ms behind the replication queue — so it answers 404 Not Found. Your page shows a broken image even though the upload “succeeded.”
A retry 200 ms later is routed to replica A, which has now converged, and returns 200 OK with the bytes. The robust fix: retry with backoff, or read the object back from the primary (a consistent read) for the one request that must reflect your own write.
Note that modern S3 now gives strong read-after-write consistency for new objects and overwrites — AWS routes the read so this exact new-object 404 no longer happens. The eventual-consistency model here is still the right mental model for replicated stores in general, and for understanding why that guarantee mattered.
You upload report.pdf, get 200, then immediately GET it and receive a 404. What happened?
One request must reflect your own write. What’s the cleanest fix?