Question
Design how a vector search service rebuilds its index without downtime. You host 2B 1024-dim vectors across a sharded HNSW index serving 8k QPS at 25ms p99. Periodically you must rebuild — because the embedding model was upgraded, because deletes have fragmented the graph and recall has degraded, or because you're re-sharding. The rebuild takes hours and is resource-heavy. Walk through how you rebuild a shard while it keeps serving, how you cut over without a recall cliff or a latency spike, and how you handle writes that arrive during the multi-hour rebuild.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.