Code Room
System designHardsd-g730
Subject Ml inferenceLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology

Question

Design a retrieval-augmented-generation (RAG) serving system for an enterprise knowledge assistant over ~50M internal documents (wikis, tickets, code, PDFs). Users ask natural-language questions and get a grounded, cited answer in under 3 seconds p95. The corpus changes constantly (docs edited/deleted hourly), answers must respect per-user document permissions (a user must never see content from a doc they can't access), and answer quality and groundedness matter more than raw latency. Query volume is ~200 QPS.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.