Question
A live streaming pipeline continuously ingests events into a partitioned fact table that powers production dashboards and downstream models. You've shipped a corrected transformation and need to backfill the last 180 days, but you CANNOT pause live ingest and the dashboards must stay queryable throughout. Re-running the new logic over old raw data alongside the live writes risks double-counting at the boundary, partial/torn partitions visible to readers, and contention with the live job. Design an idempotent backfill that runs concurrently with live ingest with zero downtime and no double-counting.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.