Storage data structures

How modern databases achieve massive write throughput using LSM Trees.

The idea

Writing directly to disk randomly (like updating a B-Tree in place) is very slow. Modern databases (Cassandra, RocksDB, LevelDB) use Log-Structured Merge (LSM) Trees.

Writes first go to a fast in-memory structure called a Memtable. When it's full, it is flushed to disk as an immutable, sorted file (SSTable). Over time, many SSTables pile up. A background Compaction process merges these files, discarding old versions and tombstones (deleted items), creating new merged files.

Memtable is in RAM. Disk holds immutable SSTables.

How it works (LSM Compaction Idea)

def compact(sstable_1, sstable_2):
    # Both are sorted lists of (key, timestamp, value)
    # We do a linear merge, keeping only the newest value per key!
    i, j = 0, 0
    merged = []
    
    while i < len(sstable_1) and j < len(sstable_2):
        k1, t1, v1 = sstable_1[i]
        k2, t2, v2 = sstable_2[j]
        
        if k1 < k2:
            merged.append((k1, t1, v1))
            i += 1
        elif k2 < k1:
            merged.append((k2, t2, v2))
            j += 1
        else:
            # Keys match! Keep the one with the higher timestamp
            if t1 > t2:
                merged.append((k1, t1, v1))
            else:
                merged.append((k2, t2, v2))
            i += 1
            j += 1
            
    return merged