When your ML model gets an A+ in the lab, but immediately fails in production.
In Machine Learning, you train models on historical data in a data warehouse (Offline, e.g., using Python/Pandas). Then, you deploy that model to a web server to make predictions in real-time (Online, e.g., using Java/Go). Online-Offline Skew (or Training-Serving Skew) happens when the code that calculates a feature in the Offline environment is slightly different from the code that calculates it in the Online environment. The model is trained on one definition of reality, but forced to make predictions on another. It fails silently.
Skew is incredibly hard to detect because there are no crash logs. To fix it, you must use a Feature Store. A Feature Store ensures that the logic to calculate a feature (like "User's Age") is written exactly once. Both the Offline training job and the Online web server fetch the exact same pre-calculated value from the Feature Store, guaranteeing 100% consistency.
// THE CAUSE OF SKEW: Duplicated Logic
// Offline Training (Python/Pandas)
# Round down to nearest year
df['user_age'] = floor((today - dob).days / 365)
// Online Serving (Java/Spring Boot)
# Round up to nearest year
int userAge = (int) Math.ceil((today - dob).days / 365.0);
// Result: The model was trained expecting '34',
// but in production it receives '35'. The predictions drift.
Adopting a Feature Store (like Feast or Hopsworks) is a massive architectural undertaking. It requires setting up dual databases (a fast Redis cache for Online serving, and a huge Parquet datalake for Offline training) and ensuring data is perfectly synced between them in real-time. It adds significant complexity to your infrastructure.