Code Room
Code reviewHardcr-g630
Subject Ml time series validationLevel Senior–Staff~20 minCommon in ML systems interviewsIndustries Software development

Question

Review this Python validation setup for a demand-forecasting model.

The CV MAE is great but the model misses badly in production. Find the leakage.

What a strong answer looks like

Separate real bugs from style. Rank issues by severity, point at the root cause rather than the symptom, and suggest a concrete fix — specific and kind.

Talk through your review
Code to reviewpython
import pandas as pdfrom sklearn.model_selection import KFoldfrom sklearn.ensemble import GradientBoostingRegressorfrom sklearn.metrics import mean_absolute_error def cv_forecast(df):    # df sorted by date; features include lag_1, lag_7, rolling_mean_30    X = df.drop(columns=["date", "demand"]).values    y = df["demand"].values    kf = KFold(n_splits=5, shuffle=True, random_state=0)    maes = []    for tr, te in kf.split(X):        m = GradientBoostingRegressor().fit(X[tr], y[tr])        maes.append(mean_absolute_error(y[te], m.predict(X[te])))    print("CV MAE:", sum(maes) / len(maes))
Run or narrate your approach, then ask the coach.