Question
At 10:05 a fraction of users — about 15% — start getting logged out and seeing 401s; their JWT-authenticated API calls fail with 'token used before issued' (nbf) or 'token expired' even on tokens minted seconds ago. It's not all users and not all servers: the failures cluster on requests that happen to hit a specific subset of your API pods. Internal TOTP/2FA validation is also flaky on those same pods. A new batch of nodes was added to the cluster yesterday during an autoscaling event. There was no application deploy. Login itself (token minting) succeeds; it's validation downstream that rejects. Walk through how you'd find and fix this.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.