On-callMediumoc-g672

Subject Model serving tokenizer version mismatchLevel Mid–Senior~30 minCommon in ML systems · Reliability & on-call interviewsIndustries Technology

Question

After a routine dependency bump, your NLP intent-classification serving service's accuracy quietly drops: support routing mislabels a noticeably higher share of tickets, though the service returns 200s with normal latency and confidence scores still look 'high'. Dashboards: a library upgrade in the serving image bumped the tokenizer package a minor version; the model was trained and exported against the previous tokenizer version; spot-checks show the new tokenizer splits certain tokens (numbers, hyphenated words, emoji) differently, so the input token IDs fed to the model differ from training. The model file itself is unchanged. How do you triage and respond?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.