Storing massive files cheaply using math to survive hardware failures.
To store a 10GB video, you could just copy it to 3 different hard drives (Replication). But that uses 30GB of space! Object Stores (like AWS S3) use a smarter trick: Erasure Coding.
It splits the file into pieces (e.g. 4 Data Shards) and computes 2 Parity Shards using math. You distribute these 6 shards across 6 different servers. The magic: Any 4 shards can reconstruct the original file! You survive 2 server failures while only using 1.5x the storage space (15GB instead of 30GB). This gives S3 its famous "11 nines" of durability (99.999999999%).
# Data: A, B, C, D
# Parity: P1, P2 (computed mathematically)
def read_file():
available = get_healthy_shards() # e.g. [A, B, P1, P2]
if len(available) < 4:
raise DataLossException("Too many failures!")
if all_data_present(available):
return [A, B, C, D]
# We lost C and D, but we have 4 pieces! Math time.
return reconstruct_missing(available)