Code Room
System designMediumsd-g431
Subject EtlLevel Mid–Senior~35 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

Design an incremental extraction system that syncs ~200 large source Postgres/MySQL tables into a warehouse hourly, where full re-extracts are too slow/expensive. The naive approach polls 'rows where updated_at > last_watermark', but it misses hard deletes (rows that vanish), misses rows updated within the same second as the watermark, and double-counts when clocks skew. Design an incremental extract that captures inserts, updates, AND deletes correctly and is safe to re-run after a failure.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.