Code Room
Code reviewMediumcr-g621
Subject Ml data leakageLevel Mid–Senior~18 minCommon in ML systems interviewsIndustries Software development

Question

Review this Python preprocessing pipeline.

The reported test accuracy looks strong and stable. What's wrong?

What a strong answer looks like

Separate real bugs from style. Rank issues by severity, point at the root cause rather than the symptom, and suggest a concrete fix — specific and kind.

Talk through your review
Code to reviewpython
import numpy as npfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_score def train(X, y):    scaler = StandardScaler()    X_scaled = scaler.fit_transform(X)  # normalize all features    X_tr, X_te, y_tr, y_te = train_test_split(        X_scaled, y, test_size=0.2, random_state=0)    clf = LogisticRegression(max_iter=1000)    clf.fit(X_tr, y_tr)    preds = clf.predict(X_te)    print("test acc:", accuracy_score(y_te, preds))    return clf, scaler
Run or narrate your approach, then ask the coach.