Question
After a new pricing-model release, the model's online performance is much worse than its glowing offline eval: predicted prices are systematically biased low and the business sees margin erosion, though there are no errors. Investigating, you find the offline training pipeline computes 'normalized_demand' with a 7-day rolling window in a batch Spark job, while the online serving path computes the same feature in a separate Java service — and the two implementations disagree (the online one uses a different window boundary and unit). Walk through how you confirm the issue, mitigate, and prevent it.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.