Code Room
System designHardsd-g440
Subject Data pipelinesLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

An executive dashboard and a daily-paying-out ML model both depend on a chain of ~30 interdependent jobs across 3 orchestrators (Airflow + a Spark scheduler + a vendor's sync). The business promises the dashboard is 'fresh by 8am.' Lately it's late ~20% of mornings — sometimes a single slow upstream, sometimes a vendor sync that finished but produced empty data, sometimes a retry storm. On-call gets paged at 8am with no idea which of 30 jobs is the culprit or whether the data is even trustworthy. Design a data-SLA / freshness system that guarantees and observes the 8am promise across heterogeneous orchestrators.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.