Code Room
System designMediumsd-g108
Subject Training pipelineLevel Mid–Senior~40 minCommon in ML systems interviewsIndustries Technology, Software development

Question

Design the data + labeling pipeline for a self-driving perception model. You collect petabytes of sensor logs from a fleet, but only a tiny fraction is worth labeling (most frames are boring highway driving), human labeling is expensive and slow, and rare dangerous scenarios (a child running into the road) are exactly what you most need but rarely captured. Design the pipeline from raw fleet data to a labeled training set, focusing on how you decide what to label and how you keep getting better at finding the rare, valuable cases.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.