Walking the ring to ensure data survives a server explosion.
Consistent Hashing maps data onto a "Hash Ring." To find where data belongs, you hash its key, find that spot on the ring, and walk clockwise to find the first Server. But what if that server dies? We lose the data! To prevent this, distributed databases (like Cassandra or DynamoDB) use N-Way Replication. Instead of stopping at the first server, you keep walking clockwise and store copies of the data on the first N distinct servers you encounter.
When data is written, the system calculates the hash. It looks up the "Coordinator Node" (the first node clockwise). The Coordinator is responsible for sending copies to the next N-1 nodes on the ring. This ordered list of nodes responsible for a key is called the Preference List.
# Finding the Replica Set for a key
def get_replica_set(key, ring, N=3):
position = hash(key)
replicas = []
# Find the index of the first node clockwise
idx = find_first_node_after(ring, position)
while len(replicas) < N:
node = ring[idx]
if node not in replicas: # Handle virtual nodes!
replicas.append(node)
idx = (idx + 1) % len(ring) # Walk clockwise
return replicas
Writing to 3 nodes takes longer than writing to 1. To remain fast, systems often return "Success" to the user as soon as W nodes acknowledge the write (where W < N). For example, if N=3 and W=2, the coordinator replies success after 2 nodes save it, while the 3rd node is updated asynchronously in the background.