Question
Design a distributed, replicated append-only log service (Kafka-class) as the durable backbone for event streaming: producers append to topics, consumers read by offset, ordering must be preserved per partition, and the system must survive broker failures without losing acknowledged writes. Target 1GB/sec sustained writes, thousands of partitions, multi-day retention, consumers ranging from real-time to hours-behind. Describe the log/segment storage, the replication+acknowledgement protocol, how consumers track position, and the central durability-vs-latency trade-off.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.