Question
Design the query-understanding layer that runs *before* retrieval and rewrites a raw query into a structured search request. It must: detect intent (navigational 'nike.com' vs. product 'running shoes' vs. question), recognize entities ('nike' = brand, 'size 10' = attribute, 'under $80' = price filter), expand synonyms ('sneakers' ⇄ 'trainers'), and segment multi-token brands ('north face'). Latency budget: < 25ms, because it's on the critical path before the actual search. How do you build this so it's fast, debuggable, and improvable without a model retrain for every new synonym?
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.