Code Room
System designHard
Question
Design the operational lifecycle of an ANN (approximate nearest neighbor) index for a semantic-retrieval service holding 500M item embeddings, serving 20k QPS at p99 < 30ms, where ~5M items are added/changed and ~2M removed every day. The hard part isn't the query — HNSW handles that — it's keeping the index correct and fresh: HNSW graphs don't support efficient deletes, full rebuilds take hours, and a stale index silently returns retired/deleted items. Design the update and rebuild strategy.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.