How the bytes actually sit on disk: a hot head chunk in memory, then a trail of immutable, time-ordered blocks that get merged and aged out.
Recent points are written into a small, mutable head chunk held in memory (backed by a write-ahead log so a crash loses nothing). When that chunk fills a time window, it's flushed to disk as an immutable block.
Background compaction merges several small adjacent blocks into one larger block — fewer files, better compression, faster scans. A retention policy deletes whole blocks once they fall outside the window. A range query just opens the blocks that overlap its time range.
Writes are sequential appends to the head chunk, which is the fastest pattern a disk can offer. Blocks are immutable, so once written they can be memory-mapped, compressed hard (timestamps delta-encoded, values bit-packed), and read concurrently without locks. Compaction and retention are background housekeeping.
def append(point):
wal.append(point) # durability first
head.add(point) # in-memory, sequential
if head.window_full():
flush(head) # write an immutable block, truncate WAL
def flush(head):
block = encode(head) # delta-encode ts, compress values
blocks.append(block) # append-only on disk
head.reset()
def compact(small_blocks): # background
merged = merge(small_blocks) # one bigger, better-compressed block
replace(small_blocks, merged)
def read(start, end):
return [b.read(start, end) for b in blocks if b.overlaps(start, end)]
| Operation | Cost |
|---|---|
| Append | O(1) sequential, in-memory + WAL |
| Flush | O(points in window) one big write |
| Range read | O(overlapping blocks) to open |
The trade-off: immutable blocks make writes cheap and reads lock-free, but you can't edit a point in place — corrections mean a rewrite, and compaction spends disk I/O to keep file count and read amplification low.
A monitoring agent appends CPU samples into a 2-hour head chunk. Every two hours it flushes a block to disk. Overnight, compaction merges twelve 2-hour blocks into one 24-hour block, shrinking it ~10× with delta-of-delta timestamp encoding. After 15 days the retention policy deletes whole day-blocks. A dashboard asking "last 6 hours" opens just the head chunk plus two or three recent blocks — never the months of compacted history.
Why are on-disk blocks made immutable?