Question
Design a training pipeline that guarantees full reproducibility and lineage for models that make regulated lending decisions. Months later, an auditor (or a customer dispute) must let you reconstruct exactly which model scored a given application, on which code, with which hyperparameters, trained on exactly which data snapshot — and re-run it to reproduce the same result. Walk through how you version data, code, and config together, how you capture lineage from raw data through to a deployed model, and how you actually achieve bit-or-metric-level reproducibility across a multi-stage pipeline.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.