SSTable Durability & Corruption

How databases detect bit-rot and silent disk failures.

The idea

In LSM-Tree databases (Cassandra, RocksDB), data on disk is stored in SSTables (Sorted String Tables). These files are strictly immutable; once written, they are never modified. This is great for performance, but what happens if a cosmic ray flips a bit on the hard drive, or the disk slowly degrades (bit-rot)? If the database reads a corrupted SSTable, it might return garbage data to the user without realizing it!

Step 1: An immutable SSTable sits on disk. It contains valid data and a CRC32 Checksum.

How it works (Checksums)

To guarantee durability and detect silent corruption, SSTables store a Checksum (usually CRC32) alongside every block of data. When the database reads a block from disk into RAM, it re-calculates the checksum of the raw bytes. If the calculated checksum doesn't match the saved checksum, the database knows the disk is corrupted and immediately halts the read, throwing an error rather than serving bad data.

# Pseudo-code for reading an SSTable Block safely
def read_block(sstable_file, offset, size):
    # 1. Read raw bytes from disk
    raw_data = sstable_file.read(offset, size)
    saved_checksum = sstable_file.read_checksum(offset)
    
    # 2. Verify integrity
    actual_checksum = crc32(raw_data)
    
    if actual_checksum != saved_checksum:
        raise CorruptionError("Bit-rot detected on disk!")
        
    return parse_records(raw_data)

Cost

Calculating CRC32 is extremely fast on modern CPUs (often hardware-accelerated). The true cost is what happens after corruption is detected. A single flipped bit renders the entire block unreadable. The database must discard the file and initiate a network repair process, pulling a clean replica of that SSTable from another server in the cluster.

Watch out for