A database tuned for one shape of data: a metric, a timestamp, and a relentless stream of new points that only ever moves forward.
Imagine a sensor reporting CPU usage every second. You never update old readings — you only append new ones, and you almost always ask the same kind of question: "what did this look like over the last hour?"
A time-series database (TSDB) leans into that pattern. It groups points into time-ordered partitions (one chunk per hour or per day), keeps raw points only for a while, then rolls them up into coarser summaries so old data stays cheap. A range query then touches just the partitions it needs.
Writes land in the newest partition in arrival order, so an append is just "add to the end." A background job compacts older partitions into rollups (min / max / avg / count per bucket), and a retention policy drops raw points past their age. A query picks the partitions overlapping its window and reads raw or rolled-up data depending on how old it is.
def write(point): # append-only, newest partition
p = partition_for(point.ts) # e.g. floor to the hour
p.raw.append(point) # O(1) amortised
def query(metric, start, end):
out = []
for p in partitions_overlapping(start, end):
if p.has_raw(): # young data: full resolution
out += [x for x in p.raw if start <= x.ts < end]
else: # old data: read the rollup
out += p.rollup.buckets_in(start, end)
return out
def compact(p): # background, runs on old partitions
p.rollup = downsample(p.raw) # min/max/avg/count per bucket
p.raw = [] # reclaim space
| Operation | Cost |
|---|---|
| Append a point | O(1) amortised |
| Range query | O(partitions + points read) |
| Storage for old data | O(buckets) after rollup |
The trade-off: rollups make old reads fast and cheap, but you lose per-point detail. Once raw points are compacted you can see the shape of last month, not every single sample.
user_id can explode you into millions of series and blow up memory and index size.avg can't be re-averaged across buckets without weighting by count, or your aggregate quietly lies.You collect cpu.usage per host every 10 seconds across 200 hosts. Raw, that's 1.7M points per host per day. You keep 7 days raw for incident debugging, then roll up to 1-minute buckets (min/max/avg/count) for 90 days, then 1-hour buckets for two years. A dashboard asking "p95-ish CPU yesterday" reads minute rollups — a few thousand buckets — instead of millions of raw points, and renders instantly.
A query asks for raw, per-second data from 18 months ago. What likely happens?