On-callMediumoc-g666

Subject Feature pipeline flag bad inputsLevel Mid–Senior~30 minCommon in ML systems interviewsIndustries Technology

Question

A feature-flag rollout enables a new 'enriched location' input path for 20% of traffic to your delivery-ETA model. Shortly after the flag ramps, the ETA model's predictions for that 20% go haywire — wildly large ETAs — while the other 80% are normal. The model serves 200s, no errors, normal latency. Dashboards: requests on the flagged path are passing a 'distance_km' feature that is sometimes negative or in the hundreds of thousands; the new enrichment code computes distance from a lat/lng pair but the flag's code path swaps latitude and longitude (and occasionally passes raw meters where the model expects km). The flag is at 20% and scheduled to ramp to 100% in an hour. How do you triage and respond?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.