Code Room
On-callMedium
Question
Your fintech app syncs bank-account transactions through an aggregator vendor (Plaid-style). At 08:00 a nightly batch sync that pulls fresh data for all users starts failing partway through. Dashboards: the vendor returns HTTP 429 'rate limit exceeded' after the first few thousand accounts; the rest of the batch errors; your job has no concurrency throttle and fires requests as fast as it can. Recent context: your user base grew 40% this quarter, and someone parallelized the sync job last month to 'make it finish faster.' How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.