System designHardsd-g422

Subject BackfillLevel Senior–Staff~45 minCommon in Networking & APIs interviewsIndustries Technology, Software development

Question

A live streaming pipeline continuously ingests events into a partitioned fact table that powers production dashboards and downstream models. You've shipped a corrected transformation and need to backfill the last 180 days, but you CANNOT pause live ingest and the dashboards must stay queryable throughout. Re-running the new logic over old raw data alongside the live writes risks double-counting at the boundary, partial/torn partitions visible to readers, and contention with the live job. Design an idempotent backfill that runs concurrently with live ingest with zero downtime and no double-counting.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.