System designHardsd-g360

Subject Training pipelineLevel Senior–Staff~45 minCommon in ML systems interviewsIndustries Technology, Software development

Question

Design a training pipeline that guarantees full reproducibility and lineage for models that make regulated lending decisions. Months later, an auditor (or a customer dispute) must let you reconstruct exactly which model scored a given application, on which code, with which hyperparameters, trained on exactly which data snapshot — and re-run it to reproduce the same result. Walk through how you version data, code, and config together, how you capture lineage from raw data through to a deployed model, and how you actually achieve bit-or-metric-level reproducibility across a multi-stage pipeline.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.