Code Room
System designHard
Question
You're bootstrapping CDC on a hot 3 TB Postgres table that takes ~6 hours to snapshot, while it receives ~5k writes/sec the whole time. You must produce a downstream copy that is eventually consistent with the source with NO gap and NO duplicate-induced incorrectness — but you can't lock the table for 6 hours and you can't afford to miss writes that happen during the snapshot. Design the snapshot-to-stream handoff so the consumer ends up with exactly the right final state, and explain how a mid-snapshot failure is recovered.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.