Code Room
On-callHardoc-g555
Subject On callLevel Senior–Staff~40 minCommon in Security · Reliability & on-call interviewsIndustries Technology

Question

At 10:05 a fraction of users — about 15% — start getting logged out and seeing 401s; their JWT-authenticated API calls fail with 'token used before issued' (nbf) or 'token expired' even on tokens minted seconds ago. It's not all users and not all servers: the failures cluster on requests that happen to hit a specific subset of your API pods. Internal TOTP/2FA validation is also flaky on those same pods. A new batch of nodes was added to the cluster yesterday during an autoscaling event. There was no application deploy. Login itself (token minting) succeeds; it's validation downstream that rejects. Walk through how you'd find and fix this.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.