How databases detect bit-rot and silent disk failures.
In LSM-Tree databases (Cassandra, RocksDB), data on disk is stored in SSTables (Sorted String Tables). These files are strictly immutable; once written, they are never modified. This is great for performance, but what happens if a cosmic ray flips a bit on the hard drive, or the disk slowly degrades (bit-rot)? If the database reads a corrupted SSTable, it might return garbage data to the user without realizing it!
To guarantee durability and detect silent corruption, SSTables store a Checksum (usually CRC32) alongside every block of data. When the database reads a block from disk into RAM, it re-calculates the checksum of the raw bytes. If the calculated checksum doesn't match the saved checksum, the database knows the disk is corrupted and immediately halts the read, throwing an error rather than serving bad data.
# Pseudo-code for reading an SSTable Block safely
def read_block(sstable_file, offset, size):
# 1. Read raw bytes from disk
raw_data = sstable_file.read(offset, size)
saved_checksum = sstable_file.read_checksum(offset)
# 2. Verify integrity
actual_checksum = crc32(raw_data)
if actual_checksum != saved_checksum:
raise CorruptionError("Bit-rot detected on disk!")
return parse_records(raw_data)
Calculating CRC32 is extremely fast on modern CPUs (often hardware-accelerated). The true cost is what happens after corruption is detected. A single flipped bit renders the entire block unreadable. The database must discard the file and initiate a network repair process, pulling a clean replica of that SSTable from another server in the cluster.