Code Room
System designHardsd-g189
Subject Data warehouseLevel Senior–Staff~45 minCommon in Databases & SQL interviewsIndustries Technology

Question

An MPP warehouse (Redshift/Snowflake/BigQuery-style) backs analytics for a retailer: a ~10B-row `fact_sales`, a 200M-row `dim_customer`, and a small `dim_store`. Analysts complain that joins between sales and customer are slow and that some queries hammer a few compute nodes while others sit idle. A handful of mega-customers (marketplace resellers) account for a huge fraction of rows. Design the physical layout — distribution/partitioning/clustering and join strategy — so the big fact↔dimension joins are fast and load is balanced.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.