Code Room
System designHardsd-g290
Subject Columnar storageLevel Senior–Staff~50 minCommon in Storage & CDN interviewsIndustries Technology, Software development

Question

Design the storage and scan layer of a columnar analytics store (think a ClickHouse/Parquet-class engine) for an event warehouse: 30TB/day of events, each ~120 columns, queried with 'WHERE event_date in last 7 days AND country='DE' AND event_type='purchase', aggregate revenue by hour'. Typical queries touch 3–6 columns out of 120 and filter to <2% of rows. Walk through the on-disk physical layout, how a scan prunes data it never reads, and the one decision that most affects scan cost.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.