Code Room
System designHard
Question
Design the storage and scan layer of a columnar analytics store (think a ClickHouse/Parquet-class engine) for an event warehouse: 30TB/day of events, each ~120 columns, queried with 'WHERE event_date in last 7 days AND country='DE' AND event_type='purchase', aggregate revenue by hour'. Typical queries touch 3–6 columns out of 120 and filter to <2% of rows. Walk through the on-disk physical layout, how a scan prunes data it never reads, and the one decision that most affects scan cost.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.