Code reviewMediumcr-g621

Subject Ml data leakageLevel Mid–Senior~18 minCommon in ML systems interviewsIndustries Software development

Question

Review this Python preprocessing pipeline.

The reported test accuracy looks strong and stable. What's wrong?

What a strong answer looks like

Separate real bugs from style. Rank issues by severity, point at the root cause rather than the symptom, and suggest a concrete fix — specific and kind.

Learn the concepts

Talk through your review

Code to reviewpython

1import numpy as np2from sklearn.preprocessing import StandardScaler3from sklearn.model_selection import train_test_split4from sklearn.linear_model import LogisticRegression5from sklearn.metrics import accuracy_score6 7def train(X, y):8    scaler = StandardScaler()9    X_scaled = scaler.fit_transform(X)  # normalize all features10    X_tr, X_te, y_tr, y_te = train_test_split(11        X_scaled, y, test_size=0.2, random_state=0)12    clf = LogisticRegression(max_iter=1000)13    clf.fit(X_tr, y_tr)14    preds = clf.predict(X_te)15    print("test acc:", accuracy_score(y_te, preds))16    return clf, scaler

Run or narrate your approach, then ask the coach.