Rewind a database to the instant before a bad write by replaying its change log onto an old snapshot.
At 03:00 someone runs DELETE FROM users with no WHERE clause. Your last full backup is from 02:00. Restoring it loses an hour of good work. Restoring nothing loses everyone.
Point-in-time recovery (PITR) gives you a third option. A database keeps a base snapshot plus a continuous write-ahead log (WAL / binlog / redo log) recording every committed change in order. To recover, you restore the snapshot, then replay the log forward — but you stop the replay at a chosen target: the instant just before the destructive transaction. You get the database exactly as it was one heartbeat before the damage.
Every committed change is appended to the write-ahead log before the data pages are touched on disk. Each record carries a log sequence number (LSN) and a commit timestamp, so the log is a totally-ordered history of the database. Archived WAL segments are shipped off to durable storage continuously; full snapshots are taken on a cadence (say nightly, or hourly for hot systems).
Recovery is mechanical: lay down the snapshot, then replay log records in order, stopping at your recovery target. In Postgres you set recovery_target_time or recovery_target_lsn; with MySQL you slice the binlog using mysqlbinlog --stop-datetime or --stop-position. The crucial rule: replay to just before the bad transaction, never up to or past it.
restore(base_snapshot) # DB now reflects 02:00
target_lsn = lsn_just_before_bad # NOT the bad txn's own LSN
for rec in wal_log: # records are LSN-ordered
if rec.lsn < target_lsn:
apply(rec) # replay this committed change
else:
break # stop: the next record is the DELETE
# DB is now exactly as it was an instant before the damage.
RPO (recovery point objective) is how much data you can lose — the gap back to your last durable log record. Continuous archiving keeps RPO near zero. RTO (recovery time objective) is how long restore + replay takes; it grows with the volume of WAL you must replay since the snapshot.
| Signal | What it measures | What controls it |
|---|---|---|
| RPO | Worst-case data-loss window if you must recover | How often WAL is archived to durable storage — continuous shipping → near-zero RPO |
| RTO | Time to restore the snapshot and replay forward | Snapshot age — more WAL to replay since the snapshot means a longer recovery |
| Storage cost | Bytes held to make PITR possible | Snapshot size × cadence + the length of WAL history you retain |
| Target granularity | How precisely you can stop the replay | Timestamp targets (clock-bound, coarse) vs exact LSN / binlog position (precise) |
Snapshot taken at 02:00. Transactions commit roughly every five minutes. At 03:00 an accidental DELETE FROM users commits — call it the bad transaction.
1. restore(snapshot @ 02:00) # base state
2. replay WAL: txn @ 02:05, 02:10, … 02:55
3. target = 02:59:59 (just before the 03:00 DELETE)
4. stop at the last record < target; do NOT apply the DELETE
You keep every legitimate change up to 02:55 and exclude the destructive delete entirely — RPO here is ~0 legitimate data lost. The RTO is dominated by restoring the 02:00 snapshot plus replaying one hour of WAL; snapshotting more frequently would shrink that replay window and speed recovery.
An accidental DELETE FROM orders committed at 14:32:10. For point-in-time recovery, what target time should you replay the WAL to?
Your team archives WAL every 15 minutes. Which objective does that 15-minute window primarily set?