Code Room
System designHard
Question
Design a multi-tenant model-hosting platform (think 'serverless inference') where thousands of customers each deploy their own custom models, most of which receive sparse, bursty traffic — many models get a request once an hour, a few get thousands per second. You can't keep every model loaded on a GPU (too expensive), but cold-starting a model takes 5-30 seconds. Design the system to serve this long-tail traffic cost-efficiently while keeping cold-start pain acceptable.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.