Feature stores & MLOps

Preventing training data from accidentally leaking the future.

The idea

To train a fraud model, you look at past transactions. If a transaction happened on Tuesday, you must fetch the user's features (e.g., "login_count") exactly as they were on Tuesday. This is called Point-in-Time Correctness.

If your training query accidentally grabs the user's "login_count" from Friday, you have Feature Leakage—the model learns using data from the future that won't be available during live inference! A Feature Store solves this by time-traveling, ensuring offline training data perfectly matches what the online model will see.

Monday Tuesday (Target) Friday login_count=1 Fraud=True login_count=50 As-Of Time
Training a model to predict the Tuesday Fraud event.

How it works (Point-in-Time Join)

# BAD: Naive SQL leaks the future
SELECT t.is_fraud, u.login_count 
FROM transactions t JOIN users u ON t.user_id = u.id
# (Grabs Friday's login_count for Tuesday's transaction!)

# GOOD: Feature Store Point-in-Time Join
# Only joins feature values whose timestamp is <= the label timestamp
feature_store.get_historical_features(
    entity_df=transactions_df, # contains 'user_id' and 'timestamp'
    features=["user:login_count"]
)