Question
Design a RAG backend for a news-and-research assistant where freshness is the product: a document published two minutes ago must be retrievable, and answers must prefer recent, authoritative sources. Corpus is ~200M documents growing by ~500k/day, queries are ~2k QPS with a 400ms retrieval budget before the LLM call. Walk through the ingestion-to-retrievable pipeline that keeps the index fresh, how you combine lexical and dense retrieval, and how a re-ranking stage balances semantic relevance against recency and source authority without re-embedding the whole corpus on every edit.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.