Anti-Entropy Convergence

How distributed replicas gossip to heal silent data divergence.

The idea

In eventually consistent systems (like DynamoDB or Cassandra), network partitions or dropped packets can cause one database replica to miss a write. "Anti-entropy" is a background process where replicas randomly pair up, compare their data, and resolve differences to ensure they eventually converge to the exact same state.

Step 1: Three replicas exist. Node A receives an update (V2). Nodes B and C missed it.

How it works (Merkle Trees)

Comparing terabytes of data over the network is too slow. Instead, databases use a Merkle Tree (a tree of hashes). Replicas only compare the root hash. If it differs, they traverse down the tree to find exactly which blocks of data differ, transferring only the missing data.

# Simplified Anti-Entropy Round
def anti_entropy_round(node1, node2):
    # 1. Compare root hashes (O(1) network cost)
    if node1.merkle_tree.root_hash() == node2.merkle_tree.root_hash():
        return # Perfect sync!
        
    # 2. Find differences by traversing the tree
    differing_keys = find_mismatched_leaves(node1.merkle_tree, node2.merkle_tree)
    
    # 3. Exchange data for only those keys
    for key in differing_keys:
        val1, timestamp1 = node1.get(key)
        val2, timestamp2 = node2.get(key)
        
        # Last-Write-Wins (LWW) resolution
        if timestamp1 > timestamp2:
            node2.put(key, val1, timestamp1)
        else:
            node1.put(key, val2, timestamp2)

Cost

Time Complexity: O(log N) network rounds to find a mismatched block of data using a Merkle tree, where N is the number of data blocks. Space Complexity: O(N) to store the Merkle tree hashes in memory/disk.

Watch out for

CPU/Disk I/O spikes: Recalculating Merkle trees constantly is expensive. Systems usually batch updates and run anti-entropy during off-peak hours.
Deleted Data (Tombstones): If you delete a row, you can't just erase it, otherwise anti-entropy will think the node just "missed" the row and will re-add it! You must write a "Tombstone" marker indicating the row is dead.