Question
In an enterprise ITSM platform, an admin updates a config that sets the default business-hours calendar used for SLA clocks (e.g., changing the org default from 'UTC 9-5' to 'follow-the-sun' regional). The change is applied at 16:00 and propagates to the SLA engine. Within an hour, the support team reports that a batch of in-flight tickets suddenly flipped from 'on track' to 'breached,' and some auto-escalations fired and paged on-call managers needlessly. Dashboards: no service errors; the SLA-engine recompute job ran at config-apply time and re-derived remaining-time for OPEN tickets against the new calendar; a 'breach count' metric jumped at 16:00 then partially settled. Triage, explain the mechanism, then mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.