Code Room
On-callMediumoc-g164
Subject Consumer lagLevel Mid–Senior~30 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

A Kinesis-backed analytics pipeline alerts at 13:00: `GetRecords.IteratorAgeMilliseconds` has climbed to 25 minutes and is rising on the `events` stream (6 shards). The KCL consumer fleet (6 workers) is at ~20% CPU. CloudWatch shows `ReadProvisionedThroughputExceeded` is non-zero and climbing on the stream, and `GetRecords.Latency` is elevated. Recent context: a new internal dashboard team subscribed *a second consumer application* (a second KCL app, not enhanced fan-out) to the same stream this morning. Triage and mitigate the growing iterator age.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.