System designHardsd-g726

Subject Ml online learningLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology

Question

Design an online-learning system that uses a contextual bandit to choose which of ~30 promotional offers to show each user at app-open, optimizing 7-day retained-conversion. You serve ~5M decisions/day with a 40ms p99 budget. Reward is delayed (you only know the 7-day outcome a week later) and partial (you only observe reward for the arm you actually showed). The offer catalog changes weekly, and you must avoid getting stuck always showing the currently-best offer (you need principled exploration), while not torching revenue with too much random exploration.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.