Question
Your storefront caches each user's top-N recommendations in a key-value cache to avoid re-scoring on every page load. Support tickets spike: users complain the homepage keeps recommending items that are out of stock or that they already bought hours ago, and merchandising says a flash-sale collection that ENDED is still being recommended. The model and scoring path are healthy and fast. Dashboards: the recommendation cache has a 24-hour TTL and a 99% hit rate; there is no event-driven invalidation when inventory, purchases, or campaign state change; the flash sale ended at 00:00 but cached recs from before then are still being served. How do you triage and respond?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.