System designHardsd-g099

Subject Model servingLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology, Software development

Question

Design a multi-tenant model-hosting platform (think 'serverless inference') where thousands of customers each deploy their own custom models, most of which receive sparse, bursty traffic — many models get a request once an hour, a few get thousands per second. You can't keep every model loaded on a GPU (too expensive), but cold-starting a model takes 5-30 seconds. Design the system to serve this long-tail traffic cost-efficiently while keeping cold-start pain acceptable.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.