Code Room
System designHardsd-g185
Subject Data qualityLevel Senior–Staff~45 minCommon in Reliability & on-call interviewsIndustries Technology

Question

An ML feature pipeline computes ~600 features daily that feed recommendation and ranking models in production. Twice this year a silent upstream change (a field went 100% null, a currency unit flipped) corrupted features, the model degraded for days, and revenue dropped before anyone noticed because nothing 'errored'. Design a data-quality and observability system that catches these silent corruptions automatically, distinguishes real anomalies from legitimate shifts (e.g. Black Friday traffic), and decides whether to block a bad feature from reaching the model.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.