System designHardsd-g728

Subject Ml feature storeLevel Senior–Staff~40 minCommon in ML systems · Reliability & on-call interviewsIndustries Technology

Question

Design an ML feature-monitoring and observability system that watches features and predictions across ~200 production models. For each model you must detect, within minutes-to-hours, when an input feature's distribution shifts, when a feature pipeline silently starts emitting nulls or stale values, and when prediction distributions drift relative to training. The system ingests roughly 500k predictions/sec in aggregate, must support both real-time alerting and historical investigation, and most models have delayed or no ground-truth labels, so you can't rely on accuracy as the primary signal.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.