Code Room
On-callHard
Question
Your read path is cache (Redis) in front of a database, ~95% hit rate normally. At 21:00 the managed Redis has a brief blip — it's only unreachable for 8 seconds — but it comes back EMPTY (the failover cleared it). For the next several minutes your database is melting: DB CPU 100%, connection pool exhausted, query latency 10x, app error rate 35%. Redis itself is now healthy and reachable; it's just cold. No deploy. The 8-second blip ended minutes ago. How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.