System designHardsd-g341

Subject Inverted indexLevel Senior–Staff~45 minCommon in Databases & SQL interviewsIndustries Technology, Software development

Question

Design the low-level posting-list storage and query-evaluation engine of an inverted index that must answer boolean + phrase queries over a 5B-document corpus with very long posting lists for common terms (the term 'the' appears in billions of docs). A naive 'intersect two sorted doc-id lists' is too slow when one list is billions long and the other is short. How do you store posting lists compactly and evaluate AND/OR/phrase queries so the cost scales with the *smaller* list and with the result, not with the giant list?

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.