Code Room
On-callHard
Question
Over the last 30 minutes your Redis cache (maxmemory set, allkeys-lru eviction) has degraded: hit ratio fell from 95% to 60%, evicted_keys is climbing by tens of thousands per second, the origin DB load is doubling, and Redis p99 GET latency rose from 0.4ms to 9ms. used_memory is pinned at maxmemory. A new feature shipped this morning that caches per-user personalized search results keyed by user_id + a serialized query object, with no explicit TTL. How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.