Code Room
On-callMedium
Question
At 00:03 UTC your synthetic monitors go red and the support queue floods with 'your site is not secure' and NET::ERR_CERT_DATE_INVALID screenshots. Browsers and your mobile app both refuse to connect to api.yourapp.com. The load balancers are healthy, CPU is flat, and there was no deploy. Checking the cert with openssl s_client shows notAfter was yesterday at 23:59 UTC. You're on call. How do you triage, restore service, and prevent a repeat?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.