Code Room
On-callHardoc-g671
Subject Feature pipeline point in time leakageLevel Senior–Staff~40 minCommon in ML systems interviewsIndustries Technology

Question

A newly deployed credit-risk model looked excellent offline (AUC 0.92) but online performance is mediocre and defaults among approved applicants are higher than projected. No errors, normal latency. Investigating the feature pipeline, you find that the offline training set was built by joining each application to the feature store WITHOUT a point-in-time correct lookup: features like 'total_outstanding_debt' and 'num_delinquencies_12m' were pulled at their CURRENT value, which for many historical applications reflects data that only existed AFTER the loan decision (including post-default information). Online serving, of course, only has features as of decision time. How do you triage, confirm, and respond?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.