System designHardsd-g423

Subject Data qualityLevel Mid–Staff~40 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

Design a data-quality gating system for a daily partitioned warehouse pipeline where bad upstream data (a source bug dumping nulls, a 10x row-count drop, a timezone shift, a duplicate load) currently silently corrupts a finance fact table that ~40 downstream dashboards and an ML feature pipeline depend on. The goal: a bad partition must be detected and BLOCKED from being published/consumed before it poisons downstream, while a healthy partition flows through automatically with no human in the loop. False positives that block good partitions are nearly as costly as missed bad ones.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.