Code Room
System designHard
Question
A bug in a transformation under-counted user watch-minutes for the last 90 days across a partitioned fact table (~3 TB/partition-day, 90 days). You must backfill the corrected values without taking the table offline for live dashboards, without double-counting, and without saturating the shared cluster that also runs daily pipelines and ad-hoc analyst queries. Design the backfill: how you reprocess 90 days correctly, how you swap in corrected data safely, and how you avoid starving production workloads.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.