Object & blob storage

Storing massive files cheaply using math to survive hardware failures.

The idea

To store a 10GB video, you could just copy it to 3 different hard drives (Replication). But that uses 30GB of space! Object Stores (like AWS S3) use a smarter trick: Erasure Coding.

It splits the file into pieces (e.g. 4 Data Shards) and computes 2 Parity Shards using math. You distribute these 6 shards across 6 different servers. The magic: Any 4 shards can reconstruct the original file! You survive 2 server failures while only using 1.5x the storage space (15GB instead of 30GB). This gives S3 its famous "11 nines" of durability (99.999999999%).

Durability: 99.999999999%
File split into 4 Data shards + 2 Parity shards.

How it works (Erasure Coding)

# Data: A, B, C, D
# Parity: P1, P2 (computed mathematically)

def read_file():
    available = get_healthy_shards() # e.g. [A, B, P1, P2]
    
    if len(available) < 4:
        raise DataLossException("Too many failures!")
        
    if all_data_present(available):
        return [A, B, C, D]
        
    # We lost C and D, but we have 4 pieces! Math time.
    return reconstruct_missing(available)