Code Room
System designHard
Question
Design an end-to-end training pipeline for a click-through-rate model that retrains daily on ~5B labeled impressions, must produce a reproducible, validated model artifact, and feed a daily-refreshed online predictor. Latency to production (data ready → model live) should be under 4 hours, and a bad model must never reach 100% of traffic. Cover data ingestion/labeling, the orchestration, validation gates, and how offline training stays consistent with online serving.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.