On-callMediumoc-g577

Subject Inference embedding store latencyLevel Mid–Senior~30 minCommon in ML systems interviewsIndustries Technology

Question

Your personalization service fetches precomputed user + item embeddings from an embedding store (cache in front of a database) before scoring. At 18:00 p99 latency on the embedding-fetch step jumps from 15ms to 600ms and the backing DB shows a read-QPS spike. Dashboards: the embedding cache hit-rate fell from 98% to 60% at 18:00; a job that nightly regenerates and bulk-reuploads all item embeddings ran early today at 18:00 and rewrote every item key (changing values), and the cache evicted/invalidated en masse. No app deploy. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.