Two writers, one file, a lost update

When both clients read the same value before either writes it back, the second write quietly paves over the first.

The idea

Updating a shared object usually means a read-modify-write: read the current value, compute a new one, write it back. That is three separate steps, and other clients can sneak in between them.

If client A reads 100, then client B reads 100 before A has written, both compute from a stale view. B writes last and clobbers A — A's change is silently lost. That gap between checking a value and acting on it is the TOCTOU (time-of-check to time-of-use) race.

The fix is to make the critical section indivisible: take a lock so only one writer runs read-modify-write at a time, or use a versioned compare-and-swap that writes only if the value hasn't changed since you read it, and retries if it has.

Pick a scenario, then step through it. Each client adds +10 to the counter.

How it works

The broken version reads, modifies, and writes as three separate steps, so another writer can interleave between the read and the write. Wrapping those three steps in a lock makes the section indivisible; a compare-and-swap gets the same guarantee without ever blocking.

# BROKEN — read-modify-write with no serialization
def add_ten(store, key):
    value = store.read(key)      # A and B can both read 100 here
    value = value + 10           # both compute 110 from a stale view
    store.write(key, value)      # last writer wins; one +10 is lost

# FIXED (lock) — serialize the critical section
def add_ten_locked(store, key, lock):
    with lock:                   # only one writer in here at a time
        value = store.read(key)  # read . modify . write are now atomic
        store.write(key, value + 10)

# FIXED (compare-and-swap) — lock-free; no one is ever blocked
def add_ten_cas(store, key):
    while True:
        value, version = store.read_versioned(key)
        new = value + 10
        # write ONLY if the version still matches what we read
        if store.write_if(key, new, expected_version=version):
            return                # success
        # a concurrent write bumped the version — loop and retry
        # CAS is lock-free: failure just means "try again", never "wait"

Cost & signals

Property	What it costs you
Correctness	Without serialization, concurrent read-modify-write silently drops updates (lost-update bug).
Lock latency	A lock serializes writers; under contention they queue and wait, adding tail latency.
CAS retries	Lock-free, but a stale write is rejected and retried — wasted work grows with contention.
Deadlock risk	Multiple locks taken in inconsistent order can deadlock; lock leases / timeouts bound the damage.
Signal	Rising conditional-write-failed / version-mismatch rate, or counters that don't add up (lost increments).

Watch out for

Read-modify-write with no lock or CAS is a lost-update bug waiting to happen — the moment two writers overlap, one update vanishes.
The check and the use must be one atomic step. Checking a value and then acting on it as separate operations still races (TOCTOU) — the value can change in between.
Holding a lock too long, forgetting to release it, or crashing while it's held stalls everyone. Use lock leases / timeouts so a dead holder eventually frees the lock.
CAS can livelock under high contention — every writer keeps retrying and none makes progress. Add backoff or fall back to a lock.
Don't assume a single replica. Last-writer-wins across replicas can still lose data even when each replica is internally consistent.

Worked example

A page has a shared views counter sitting at 100. Two requests arrive at the same instant, and each does views = read() + 1 then writes the result back.

Request 1 reads 100. Before it writes, request 2 also reads 100. Both compute 101, and both write 101. The counter ends at 101 — but two views happened, so it should read 102. One increment is silently lost.

A per-key lock (serialize the two requests) or an atomic increment / CAS-with-retry fixes it: the second request reads the value the first already committed, computes 102, and writes that.

Check yourself

Two clients run x = read(); write(x + 10) on a counter at 100, with no lock, and their reads both happen before either write. What is the final stored value?

Why does a versioned compare-and-swap prevent the lost update without taking a lock?