Code Room
System designMediumsd-g102
Subject Online inferenceLevel Mid–Senior~40 minCommon in ML systems · Distributed systems interviewsIndustries Technology

Question

Design the real-time ad ranking service that scores candidate ads for each ad slot on a social feed. For every impression it must fetch dozens of features (user features, ad features, context) from multiple stores, run a ranking model over ~500 candidate ads, and return the winner within a 30ms budget — at 1M requests/second. A slow or failed feature store call cannot be allowed to blow the latency budget or drop the whole request. Design the inference service with its feature-fetch fan-out and degradation strategy.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.