Code Room
On-callHardoc-g573
Subject Feature pipeline data qualityLevel Senior–Staff~35 minCommon in ML systems · Distributed systems interviewsIndustries Technology

Question

An ops alert fires: your loan-approval model's approval rate jumped from 22% to 61% in two hours with no model change. Dashboards: the model serves 200s, latency normal. Digging in, the feature 'annual_income' is now arriving as 0 or null for ~70% of applicants since 13:00, and the model treats low/zero income with a feature default that happens to push scores toward approval. An upstream data provider changed a JSON field name ('income' → 'incomeAnnual') in a 13:00 release, so your ingestion silently maps the old key to null. Triage and respond.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.