Code Room
System designHardsd-g439
Subject EtlLevel Senior–Staff~40 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

Design a streaming ETL that enriches a high-volume click/event stream (80k/sec) by joining it to slowly-changing reference data (user profiles, product catalog, ~500M rows updated via their own CDC stream) and a second event stream (impressions, joined to clicks within a 30-minute window). You need low latency, must handle reference updates and out-of-order/late events on both streams, and can't fit all reference data in memory on every worker. Design the join strategy, state, and the correctness pitfalls.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.