Backup & Restore

Protecting your state against human error and physical disasters.

The idea

High availability (like running 3 database replicas) protects you if a single server crashes. But if an admin accidentally types `DROP TABLE users`, that destructive command is instantly replicated to all 3 servers! To survive logical errors or region-wide physical disasters, you must take point-in-time snapshots of your data and ship them to cold storage.

Step 1: The database is running and replicating normally across two nodes.

How it works (Full + Incremental)

Taking a full backup of a 10TB database every day is too expensive. Instead, systems take a Full Backup once a week, and then continuously back up the Write-Ahead Log (WAL), which acts as an incremental backup of every single transaction.

# Point-In-Time-Recovery (PITR) Concept

def restore_database(target_time):
    # 1. Fetch the last FULL backup before the target_time
    full_backup = s3.download("s3://backups/db-full-sunday.tar")
    db.load_snapshot(full_backup)
    
    # 2. Fetch all incremental WAL files since Sunday
    wal_files = s3.list("s3://backups/wal/", since="Sunday")
    
    # 3. Replay every transaction sequentially up to the target time
    for wal in wal_files:
        for transaction in wal.transactions:
            if transaction.timestamp > target_time:
                break # Stop exactly when requested!
            db.apply(transaction)
            
    print("Database restored successfully!")

Cost

Storage cost is high (S3 is cheap, but TBs add up). But the real cost is RTO (Recovery Time Objective). Replaying days of WAL files to recover a database can take hours of downtime.

Watch out for

Unverified Backups: A backup that cannot be restored is not a backup. You must regularly automate tests that pull the backup and boot it up in a sandbox.
Cross-Region Requirements: If your database is in AWS `us-east-1` and your S3 bucket is also in `us-east-1`, a total region failure takes out both. Backups must be shipped to a physically separate region.