Question
Your fraud model retrains nightly on freshly joined labels (a transaction joined to its chargeback/confirmed-fraud outcome). At 08:00 a 'label freshness' alarm fires: the labeled-training table for the last 36 hours is ~90% empty — almost no transactions are getting labels attached — even though the model is still scoring live traffic fine. Dashboards: the raw transactions stream is healthy and the chargeback/outcome events topic is producing normally, but the join job that attaches outcomes to transactions has an output-row count near zero since 20:00 yesterday; its run succeeded (exit 0) and emitted no errors. A schema migration last night renamed the transaction key column from 'txn_id' to 'transaction_id' in the outcomes feed only. How do you triage and respond?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.