Code Room
System designHard
Question
Design a change-data-capture (CDC) pipeline that streams every row change from a 5 TB MySQL OLTP cluster (40k writes/sec) to a data lake, a search index, and a cache invalidator. Consumers need changes within 5 seconds, in commit order per row, with no lost or duplicated effects after a crash. How do you capture changes, preserve ordering, and guarantee delivery semantics?
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.