Code Room
On-callHardoc-g287
Subject Cert expiryLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology

Question

At 11:20 a subset of your B2B partners — about 30% of inbound API traffic — start failing all calls with 'unable to get local issuer certificate' / 'certificate verify failed'. Your own leaf cert was renewed two weeks ago and shows 300+ days remaining; openssl s_client from your bastion validates the full chain fine. Dashboards: TLS handshake failures spiked only on connections from partners running older pinned trust stores; modern browsers and your monitoring probes are unaffected. The only recent change is that your ACME automation rotated the certificate yesterday during the normal renewal window. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.