When the ground truth arrives too late to train your machine learning model.
To train a Machine Learning model, you need Features (the inputs, like a user's browsing history) and Labels (the actual outcome, like "Did they click the ad?"). In many systems, you know the Features immediately, but you have to wait to find out the Label. For an ad-click, you only wait 5 seconds. But what if you are predicting "Will this credit card transaction result in a chargeback?" It can take 60 days for a bank to finalize a chargeback. This is called Label Delay.
Because of Label Delay, you cannot use data from the last 60 days to train your model, because you don't actually know if they are chargebacks yet. If you naively train your model on yesterday's data, it will assume 0% chargebacks and learn the wrong thing. You have to enforce a strict Observation Window and only train on data older than the delay period.
# The naive, incorrect way (Labels are immature!)
# We use all data up to today.
training_data = get_data(end_date=datetime.today())
# The correct way (Accounting for Label Delay)
# If chargebacks take 60 days to settle, we must discard
# the most recent 60 days of data from our training set.
safe_end_date = datetime.today() - timedelta(days=60)
training_data = get_data(end_date=safe_end_date)
model.fit(training_data)
By dropping the most recent 60 days of data, your model is always 2 months blind. If scammers invent a brand new type of credit card fraud today, your model won't even begin to see the training labels for it until two months from now. Your business absorbs losses during that entire blind spot.