A network partition tears one cluster into two halves — and if each half still thinks it's in charge, both keep writing and the data diverges.
Picture five storage nodes that agree on a value, x=1. A network link drops and the cluster is cut into two groups that can no longer talk to each other. Each group can still see itself — and if each one keeps accepting writes, believing it's the live cluster, they diverge: one side ends up with x=2, the other with x=3. That conflict is split-brain, and when the partition heals you're left guessing which value was "real".
The fix is a quorum: only let a side accept writes if it holds a majority of all nodes (more than half). A partition can carve the cluster up however it likes, but at most one piece can hold a majority — so there's only ever one writable side, and the data can't fork.
Give every node a vote. A side may accept a write only if it can reach a majority of the cluster — floor(N/2) + 1 nodes. Because two disjoint groups can't both be more than half of the same whole, a partition produces at most one majority side. That one side stays writable; every minority side must refuse writes (go read-only) until it rejoins. No two writable sides means no divergence.
QUORUM = N // 2 + 1 # e.g. N=5 -> 3
def accept_write(value):
reachable = count_reachable_nodes() # incl. self
if reachable >= QUORUM:
commit(value) # this side has the majority
return "OK"
else:
return "REJECT: lost quorum, read-only" # minority side fences itself
A leader that loses quorum must step down rather than keep serving stale writes. Real systems enforce this with leases / fencing tokens (a monotonically increasing epoch attached to every write, so storage rejects a deposed leader's late writes) or STONITH ("shoot the other node in the head" — power-fence the minority). For AP systems that deliberately accept writes on both sides, divergence is expected and resolved on heal with last-writer-wins, version vectors, or CRDTs.
| Property | What it means |
|---|---|
| Availability vs consistency | CP: minority side refuses writes (no divergence, lower availability). AP: both sides accept & reconcile later (always writable, but conflicts to resolve). |
| Quorum size | floor(N/2) + 1. For N=5 that's 3; a 3-side has quorum, a 2-side does not. |
| Failures tolerated | floor((N-1)/2). N=5 survives 2 down nodes and still forms a majority; N=3 survives 1. |
| Why odd N | Even N wastes a node: N=4 still only tolerates 1 failure (same as N=3) but can deadlock on a clean 2–2 split where neither side has a majority. |
| Reconciliation cost on heal | CP: ~zero — minority just copies the majority's log. AP: O(diverged keys) merge work, plus the risk of dropping a write under last-writer-wins. |
N, or add a lightweight witness / arbiter to break ties.Five nodes hold x=1; the network splits them 3 | 2.
Without quorum (both sides write):
left (3 nodes) accepts x=2
right (2 nodes) accepts x=3
heal -> CONFLICT: x=2 vs x=3, must reconcile (and maybe lose a write)
With quorum (need 3 of 5):
left (3 nodes) has majority -> writable, commits x=2
right (2 nodes) lacks majority -> read-only, refuses writes
heal -> right copies the majority log, everyone x=2, no conflict
Same partition, same nodes — the only difference is the rule. Quorum guarantees the 2-side can't write, so there is nothing to reconcile when the link comes back.
A 4-node cluster splits exactly 2 | 2. Under majority quorum (need 3), which side keeps accepting writes?
An AP store lets both halves write during a partition. When the link heals, what does it actually have to do?