Strong consistency with quorums

If enough replicas must agree to write, and enough must agree to read, the two groups are forced to overlap — so a read can never miss the latest write.

The idea

Object storage keeps several copies (replicas) of every object. With eventual consistency, a write is acknowledged the moment one replica has it, so a read elsewhere can still return a stale value for a while.

Strong consistency closes that window with a quorum. Spread the object across N replicas. A write isn't acknowledged until W of them durably hold it; a read isn't answered until R of them respond. Pick the numbers so that W + R > N, and the write set and read set are guaranteed to share at least one replica — that overlap always carries the newest version.

client
Five replicas all hold version v1. We're about to write v2.

How it works

The coordinator fans the write out to all replicas but waits only for W durable acks before telling the client "done". A read fans out and waits for R responses, then returns the one with the highest version. The W + R > N assertion is what makes the read see the write.

N = 5            # replicas
W = 3            # write quorum
R = 3            # read quorum
assert W + R > N # 3 + 3 = 6 > 5  -> read & write sets overlap

def write(obj, value, version):
    send_to_all_replicas(obj, value, version)
    acks = 0
    for reply in await_replies():          # slowest replica sets the pace
        if reply.ok: acks += 1
        if acks >= W:                       # enough durable copies
            return "acknowledged"           # only NOW is the client told

def read(obj):
    replies = []
    for reply in await_replies():
        replies.append(reply)
        if len(replies) >= R:               # enough responses
            break
    return max(replies, key=lambda r: r.version).value  # newest wins

Trade-offs

PropertyStrong (W+R>N)Eventual
Read after a successful writeAlways sees the latest valueMay read a stale value for a while
LatencyPaced by the slowest replica in the quorumOne fast ack; no waiting on peers
Availability under a partitionLower — must reach a full quorumHigher — any reachable replica serves
Tuning for read-heavy loadSmall R, large W (cheap reads)n/a — reads always cheap
Tuning for write-heavy loadSmall W, large R (cheap writes)n/a — writes always cheap

Watch out for

Worked example

Take N = 5, W = 3, R = 3. We write v2 and it is durably acked by replicas {1, 2, 3} — that's three acks, so the client is told the write succeeded.

Now a read contacts replicas {3, 4, 5}. Since W + R = 6 > 5, those two sets cannot be disjoint — they meet at replica 3, which already holds v2. The read takes the max version across its three replies and returns v2. No stale window.

Now weaken it: W = 1, R = 1. The write acks at replica 1 only; a read can land on replica 5, which still has v1. Because 1 + 1 = 2 <= 5, the sets can miss each other entirely and the read returns the stale v1.

Check yourself

With N = 5, which pair gives the read-after-write guarantee?

A write is acked by {1,2,3} and a read hits {3,4,5}. Why is the read safe?