On-callMediumoc-g590

Subject Storage snapshot restore stuckLevel Mid–Senior~35 minCommon in Storage & CDN interviewsIndustries Technology

Question

After an accidental table drop in production, you start a restore from a volume snapshot to bring up a recovery instance — but two hours in, the restored volume's restore progress is crawling and the recovery DB is unusably slow, far past your stated RTO. The cloud console shows the snapshot 'restored' quickly but disk reads on the new volume are extremely high-latency; throughput on first-touch of each block is terrible while the same blocks are fast on second read. You're under pressure because every minute of downtime is costing the business. How do you triage why the restore is so slow and get to a healthy recovery instance faster?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.