Code Room
System designHardsd-g729
Subject Model servingLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology

Question

Design a model CI/CD and deployment system that lets ~150 teams promote models from training to production safely. A new model version must be validatable offline (eval metrics, fairness/slice checks), then run in shadow against live traffic without affecting users, then roll out via canary with automatic rollback on metric regression, all without redeploying the serving binary. Serving handles 100k req/sec; deploys happen dozens of times a day across teams. You must support fast rollback and keep a clear audit trail of which model version produced which prediction.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.