System designHardsd-g420

Subject CdcLevel Senior–Staff~45 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

Design a CDC pipeline that streams changes from ~40 OLTP Postgres microservice databases (combined ~6 TB, ~8k writes/sec) into a lakehouse (Iceberg/Delta tables on object storage), keeping each lakehouse table an up-to-date mirror queryable by Trino/Spark within ~2 minutes of the source write. Source teams ship schema changes weekly (add/rename/drop columns, type widening) without coordinating with the data platform. The lakehouse must reflect inserts, updates, and deletes, and historical queries must still work after a column is renamed upstream.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.