System designHardsd-g105

Subject Model monitoringLevel Senior–Staff~45 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

Design the production quality-monitoring system for a customer-support LLM assistant handling millions of conversations a day. Unlike a classifier there's no single accuracy number, and outputs are free-form text that can be subtly wrong, unhelpful, off-policy, or hallucinated. You need to detect quality regressions (e.g. after a prompt change or model upgrade) quickly, flag harmful/hallucinated responses, and do it without humans reading every conversation. Design the monitoring + evaluation system for generative output at scale.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.