Question
Design a CDC pipeline that streams changes from ~40 OLTP Postgres microservice databases (combined ~6 TB, ~8k writes/sec) into a lakehouse (Iceberg/Delta tables on object storage), keeping each lakehouse table an up-to-date mirror queryable by Trino/Spark within ~2 minutes of the source write. Source teams ship schema changes weekly (add/rename/drop columns, type widening) without coordinating with the data platform. The lakehouse must reflect inserts, updates, and deletes, and historical queries must still work after a column is renamed upstream.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.