Code Room
On-callHardoc-g432
Subject Query plan regressionLevel Senior–Staff~35 minCommon in Databases & SQL interviewsIndustries Technology, Software development

Question

A multi-tenant API endpoint is mostly fast but a subset of tenants intermittently see 5–10s responses on the same query — and which tenants are slow seems to change after each app restart. The query filters by `tenant_id` and a status. Tenants range from a handful of rows to tens of millions. The app uses server-side prepared statements via the connection pool. `EXPLAIN` for a small tenant looks great; the slow cases use a plan that's bad specifically for the huge tenants. No deploy correlates. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.