Question
Your read path is a large Redis cache fronting a Postgres durable store for a product-catalog API. At 14:02 the catalog DB's CPU and disk-read IOPS saturate and API p99 jumps from 20ms to 2.4s. Dashboards: Redis hit-rate cratered from 94% to 31% over about two minutes; Redis `evicted_keys` spiked hard during that window; Redis memory hit `maxmemory` right before the drop; origin (Postgres) query rate went from 4k/s to 70k/s. A deploy at 13:55 added a new denormalized field to cached catalog objects, increasing each cached entry's size by roughly 3x. How do you triage, mitigate, and prevent recurrence?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.