Code Room
System designMedium
Question
A high-throughput stream consumer (150k msgs/sec) occasionally hits a 'poison message' it can't process — malformed payload, a schema it can't decode, or one that triggers a downstream 500. Currently the consumer retries the bad message forever, blocking the partition behind it and stalling the whole pipeline until an engineer manually skips it at 3am. Design poison-message handling that keeps the pipeline flowing, doesn't lose data, and lets the bad messages be inspected and reprocessed.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.