Code Room
System designMedium
Question
A data platform has ~10k tables, ~4k dbt/Spark transformation jobs, and hundreds of BI dashboards and ML features, all interdependent. An engineer who changes or drops a column has no idea what breaks downstream, so changes ship and silently corrupt dashboards discovered days later, and on-call can't tell which upstream failure caused a given dashboard to go stale. Design a lineage system that gives pre-merge impact analysis ('what does this change break') and post-incident root-cause ('what upstream broke this dashboard').
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.