System designHardsd-g359

Subject Online inferenceLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology, Software development

Question

Design the online inference path for a content-safety classifier that must score every user-generated post before it's shown, at 120k QPS with a hard 15ms p99 budget — if scoring is slow it stalls the publish path for real users. The model is a transformer classifier on GPU; naive per-request inference is too slow and too expensive. Walk through how you hit the QPS at that latency, how dynamic batching interacts with the tail-latency budget, and how you degrade gracefully if the model fleet is overloaded rather than blocking every post.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.