Code Room
System designHard
Question
Design a contextual-bandit system to choose, per user impression, which of many promotional offers to show, where a fixed A/B test wastes traffic on losing offers and offers churn weekly so you can never 'finish' an experiment. The reward (did the user convert) is delayed by hours-to-days. Walk through the explore/exploit setup, how you serve a policy at low latency, and how delayed and biased rewards complicate learning compared to a clean A/B test.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.