Code Room
System designMediumsd-g438
Subject Data lineageLevel Mid–Senior~35 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

A data platform has ~10k tables, ~4k dbt/Spark transformation jobs, and hundreds of BI dashboards and ML features, all interdependent. An engineer who changes or drops a column has no idea what breaks downstream, so changes ship and silently corrupt dashboards discovered days later, and on-call can't tell which upstream failure caused a given dashboard to go stale. Design a lineage system that gives pre-merge impact analysis ('what does this change break') and post-incident root-cause ('what upstream broke this dashboard').

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.