Code Room
System designHard
Question
Design a vector search service that holds 5 billion 768-dim embeddings (image embeddings for a visual-search product), serving 10k QPS of nearest-neighbor queries at p99 < 50ms with recall@10 ≥ 0.95. The corpus grows by ~50M vectors/day and old vectors are occasionally deleted. Explain the index choice, how you shard across machines so a single query stays fast, the memory/cost math that drives the design, and how you handle continuous inserts and deletes without recall collapsing.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.