Code Room
System designMedium
Question
Design the partitioning scheme for an analytics event store: 20TB/day of clickstream events, each with (event_time, user_id, event_type, ~30 attributes). Queries are mostly 'last 7 days, filter by event_type and a few attributes, aggregate by hour' but a meaningful minority are 'all events for one user_id over 90 days'. Writes are append-only and arrive roughly in event-time order with some lateness. Choose a partitioning (and sub-partitioning/clustering) strategy, and justify it against the two competing query shapes.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.