Code Room
On-callHardoc-g160
Subject Ordering violationLevel Senior–Staff~40 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

A trade-position service consumes a Kinesis stream keyed by `account_id`. Support reports a handful of accounts show a position that briefly went *negative* then corrected — an out-of-order apply of `debit`/`credit` events that are supposed to be strictly ordered per account. Dashboards: `IteratorAgeMilliseconds` is low; no lag. Recent context: ops increased the stream from 8 to 16 shards (a reshard / shard split) at 14:00 to handle growth, and the KCL consumer fleet scaled from 8 to 16 workers around the same time. The affected accounts are all ones whose `account_id` hash landed near a split boundary. Triage and explain the ordering violation.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.