System designHardsd-g183

Subject Data lineageLevel Senior–Staff~45 minCommon in Distributed systems interviewsIndustries Technology

Question

A large data platform has ~12k tables, ~3k dbt/Spark jobs, and dozens of BI dashboards. When an upstream column is renamed or a source feed breaks, nobody can tell which downstream tables and executive dashboards are affected until something breaks in a board meeting. Design a data-lineage system that captures table- and column-level lineage automatically, lets an engineer do impact analysis ('if I change this column, what breaks'), and powers freshness/incident alerting. Cover capture, storage, and how it stays accurate as pipelines change daily.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.