Question
At 00:00 UTC on the night clocks 'fell back' (DST ended in a region), a session-and-rate-limit store starts behaving wrongly: some users are logged out an hour early, some abusers' rate limits never reset, and a TTL-based dedup window briefly lets duplicate events through. Dashboards: one app fleet's hosts log timestamps an hour off from the others; the store keys are computed from a LOCAL-time wall clock rather than a monotonic/UTC source; the dedup window keys events by a truncated local-time bucket; a config sets `TZ` per-host and three hosts were rebuilt last week WITHOUT the timezone package, so they fall back to UTC while the rest use local time. How do you triage this time-related incident, stop the wrong expiries and duplicate leakage, and reconcile affected state?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.