Code Room
System designHard
Question
Design a streaming ETL that enriches a high-volume click/event stream (80k/sec) by joining it to slowly-changing reference data (user profiles, product catalog, ~500M rows updated via their own CDC stream) and a second event stream (impressions, joined to clicks within a 30-minute window). You need low latency, must handle reference updates and out-of-order/late events on both streams, and can't fit all reference data in memory on every worker. Design the join strategy, state, and the correctness pitfalls.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.