Replication & consistency

Copying data to multiple machines for reliability introduces the risk of reading stale data.

The idea

To avoid losing data if a server dies, we Replicate it. In a Leader-Follower setup, all writes go to the Leader, which then copies them to Followers. But copying takes time (Replication Lag).

If the network breaks (a Partition), you face a choice (the CAP Theorem). You can either stop serving requests (Consistency over Availability), or you can let users read from isolated followers, risking stale data (Availability over Consistency / Eventual Consistency).

Cluster is healthy. Leader and followers agree on 'v1'.

How it works (Sync vs Async Replication)

def write(data, sync=False):
    # 1. Leader writes to its own disk
    local_write(data)
    
    if sync:
        # Strong Consistency: Wait for followers before replying
        # (Safe, but slow. If a follower dies, the write blocks).
        wait_for_followers_to_ack()
        return "Success"
    else:
        # Eventual Consistency: Reply immediately, replicate in bg
        # (Fast, but unsafe. If leader dies now, data is lost).
        trigger_background_replication()
        return "Success"