Question
After an accidental table drop in production, you start a restore from a volume snapshot to bring up a recovery instance — but two hours in, the restored volume's restore progress is crawling and the recovery DB is unusably slow, far past your stated RTO. The cloud console shows the snapshot 'restored' quickly but disk reads on the new volume are extremely high-latency; throughput on first-touch of each block is terrible while the same blocks are fast on second read. You're under pressure because every minute of downtime is costing the business. How do you triage why the restore is so slow and get to a healthy recovery instance faster?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.