On-callHardoc-g569

Subject Inference vector db latencyLevel Senior–Staff~35 minCommon in ML systems interviewsIndustries Technology

Question

Your semantic-search service does an ANN lookup in a vector database (HNSW index, ~200M embeddings) before reranking. At 11:30 end-to-end p99 doubled to 900ms; tracing shows the time is entirely in the vector-DB query span, not the embedding model or the reranker. Dashboards: vector-DB CPU is high, the index just finished a large background merge/compaction after a bulk upsert of 20M new vectors this morning, recall@10 dipped slightly, and query node memory is near the cap causing some on-disk index reads. No application deploy. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.