Question
An event-streaming platform has ~400 topics and ~1500 producer/consumer services owned by independent teams. Producers want to evolve event schemas (add fields, deprecate fields, occasionally change a type) without coordinating a deploy with every consumer, and consumers must not crash when they read events written by an older or newer producer. Millions of events/sec across topics, multi-week retention so old-format events stay readable on replay. Design the schema-evolution and compatibility strategy so teams move independently without breaking each other.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.