Code Room
On-callHard
Question
An ops alert fires: your loan-approval model's approval rate jumped from 22% to 61% in two hours with no model change. Dashboards: the model serves 200s, latency normal. Digging in, the feature 'annual_income' is now arriving as 0 or null for ~70% of applicants since 13:00, and the model treats low/zero income with a feature default that happens to push scores toward approval. An upstream data provider changed a JSON field name ('income' → 'incomeAnnual') in a 13:00 release, so your ingestion silently maps the old key to null. Triage and respond.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.