Code Room
On-callHardoc-g576
Subject Feature pipeline late dataLevel Senior–Staff~35 minCommon in ML systems interviewsIndustries Technology

Question

Your daily batch feature pipeline (Airflow + Spark) materializes aggregate features (e.g., 'avg_session_len_7d') into the online store every morning before the model uses them. Today a 'feature completeness' check fires: ~15% of users have features computed from only partial input data. Dashboards: one upstream source table (clickstream) landed 4 hours late and incomplete today due to an upstream outage, but the feature DAG ran on schedule at its usual time and published partial aggregates anyway, overwriting yesterday's good values. The model is now serving on these thin/partial features. Triage and respond.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.