Code Room
Code reviewHardcr-g627
Subject Ml data leakageLevel Senior–Staff~20 minCommon in ML systems interviewsIndustries Software development

Question

Review this Python target-encoding helper.

CV scores are excellent but live performance is poor. Diagnose it.

What a strong answer looks like

Separate real bugs from style. Rank issues by severity, point at the root cause rather than the symptom, and suggest a concrete fix — specific and kind.

Talk through your review
Code to reviewpython
import pandas as pd def target_encode(df, cat_col, target_col):    # replace each category with the mean target for that category    means = df.groupby(cat_col)[target_col].mean()    df[cat_col + "_te"] = df[cat_col].map(means)    return df # usagedf = target_encode(df, "merchant_id", "is_fraud")X = df.drop(columns=["is_fraud"])y = df["is_fraud"]# ... then train_test_split(X, y) and fit
Run or narrate your approach, then ask the coach.