Code Room
System designHardsd-g584
Subject Model servingLevel Senior–Staff~55 minCommon in ML systems interviewsIndustries Technology

Question

Design an online model-serving platform that hosts thousands of heterogeneous models (small CPU classifiers up to large GPU transformers) for many internal teams, with per-model latency SLOs from 10ms to 500ms and bursty traffic. The platform must support safe rollouts, multi-version traffic splitting, autoscaling to zero for cold models, and not let one noisy tenant starve another. Cover the serving architecture, how you schedule across CPU/GPU, and the rollout/observability story.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.