Code Room
System designHard
Question
Design the low-level posting-list storage and query-evaluation engine of an inverted index that must answer boolean + phrase queries over a 5B-document corpus with very long posting lists for common terms (the term 'the' appears in billions of docs). A naive 'intersect two sorted doc-id lists' is too slow when one list is billions long and the other is short. How do you store posting lists compactly and evaluate AND/OR/phrase queries so the cost scales with the *smaller* list and with the result, not with the giant list?
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.