Question
A rolling deploy updates the authoritative game-server binary that hosts live match sessions (sticky: a match lives on one server for its duration). The deploy drains and replaces servers as matches end — but to roll quickly, the orchestrator also evicts servers whose matches are merely 'long-running.' Mid-rollout, players in long matches (ranked games >30 min) get disconnected mid-game in a wave, ranked results are voided, and the matchmaking queue spikes as those players re-queue. Dashboards: per-server, match-count drains to zero normally on most, but a cohort of servers hosting long matches were force-evicted at their drain deadline; disconnect events cluster on those evictions; CPU/error rate otherwise normal. Triage, explain, and prevent.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.