Code Room
On-callHard
Question
Your semantic-search service does an ANN lookup in a vector database (HNSW index, ~200M embeddings) before reranking. At 11:30 end-to-end p99 doubled to 900ms; tracing shows the time is entirely in the vector-DB query span, not the embedding model or the reranker. Dashboards: vector-DB CPU is high, the index just finished a large background merge/compaction after a bulk upsert of 20M new vectors this morning, recall@10 dipped slightly, and query node memory is near the cap causing some on-disk index reads. No application deploy. Triage and mitigate.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.