Code Room
System designHardsd-g336
Subject RelevanceLevel Senior–Staff~45 minCommon in Distributed systems interviewsIndustries Technology

Question

Design a hybrid retrieval system that combines lexical (BM25 inverted index) and dense vector (ANN) search and returns a single fused ranked list. The two retrievers disagree often: lexical nails exact-term/code/SKU queries, vectors nail paraphrase/semantic queries, and their scores are on totally different scales (BM25 unbounded, cosine in [-1,1]). Corpus: 200M documents, 8k QPS, p99 < 200ms. How do you run both, and — the crux — how do you fuse two incomparable score distributions into one ranking?

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.