Question
Your TLS certs are valid for months, but at 09:30 a slice of strict clients (some banks' API gateways, hardened browsers) start failing handshakes to your endpoint with 'OCSP response expired' / 'revocation status unavailable', while most clients connect fine. Dashboards: your origin and certs are healthy; the failing clients are the ones that hard-require a fresh stapled OCSP response. Your TLS terminator staples OCSP responses it fetches from the CA's OCSP responder and caches them. The CA's OCSP responder has been returning errors / timing out since ~08:45. How do you triage and mitigate?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.