Read-after-write consistency

You just saved a file, ask for it back, and the system says “never heard of it” — because your read landed on a copy that hasn’t caught up yet.

The idea

When you upload an object, the store writes it to one primary node and answers success right away. The new bytes are then copied to several replicas in the background, so the system stays fast and available.

But a later read can be routed to any replica by a load balancer. If it lands on one that hasn’t received the copy yet, you see stale data — or a 404 for an object you know you just wrote. After a short window the replicas converge and every read agrees. That gap is the read-after-write hazard.

See it work

client PUT / GET primary v1 load balancer routes reads
Press Play, or step through the write and the reads.

How it works

The write path is synchronous only to the primary; replication is fired off afterward. The read path picks some replica, so until replication lands it may return the old version. The fix is to make at least the read you care about avoid lagging replicas — route it to the primary, or carry a version token and demand a consistent read.

def put(key, bytes):
    primary.write(key, bytes)        # durable on primary, return now
    for r in replicas:
        replication_queue.enqueue(r, key, bytes)   # async, eventual
    return Ack(version=primary.version(key))

def get(key, consistent=False):
    if consistent:
        return primary.read(key)     # read-your-writes: skip replicas
    node = load_balancer.pick(replicas)
    return node.read(key)            # may be stale until convergence

# read-your-own-write with a version token
ack = put("avatar.png", data)
obj = get("avatar.png", consistent=True)   # or retry until version >= ack.version

Trade-offs

DimensionEventual (read replica)Strong (read primary)
Read latencyLower — nearest replicaHigher — one hot node
Staleness windowMilliseconds to secondsNone for that key
Read throughputScales with replica countBounded by the primary
CostCheaper, fan-out readsPricier, less cacheable
App complexityMust tolerate staleness / retrySimpler mental model

Watch out for

Worked example

You PUT s3://bucket/avatar.png and get 200 OK. Your UI immediately GETs it to render the new avatar. The load balancer routes that read to replica B, which is still 80 ms behind the replication queue — so it answers 404 Not Found. Your page shows a broken image even though the upload “succeeded.”

A retry 200 ms later is routed to replica A, which has now converged, and returns 200 OK with the bytes. The robust fix: retry with backoff, or read the object back from the primary (a consistent read) for the one request that must reflect your own write.

Note that modern S3 now gives strong read-after-write consistency for new objects and overwrites — AWS routes the read so this exact new-object 404 no longer happens. The eventual-consistency model here is still the right mental model for replicated stores in general, and for understanding why that guarantee mattered.

Check yourself

You upload report.pdf, get 200, then immediately GET it and receive a 404. What happened?

One request must reflect your own write. What’s the cleanest fix?