Migrating a table without locking it

Build a copy of the table with the new shape on the side, keep it in sync while it fills, then swap the two in a blink.

The idea

Running a plain ALTER TABLE on a huge table can hold a lock for minutes or hours, and during that window every read and write to that table is stuck waiting. For a busy production table, that is an outage.

The online approach (the gh-ost / pt-online-schema-change pattern) never blocks the live table. It creates a shadow table with the new schema, copies existing rows over in small batches, captures live writes happening in the meantime so the shadow stays current, and then does a single instant rename to swap them. The original table serves traffic the entire time.

See it work

The original table is live and serving traffic. Press Play to migrate it with no lock.

How it works

The tool builds the shadow table, attaches a way to capture every live write, then loops copying rows in primary-key ranges so each batch is small and quick. Once the shadow has caught up, a single atomic rename inside one statement makes the swap appear instant to clients.

-- 1. shadow table = same shape as original, then the change we want
CREATE TABLE users_shadow LIKE users;
ALTER TABLE  users_shadow ADD COLUMN status VARCHAR(16) NOT NULL DEFAULT 'active';

-- 2. capture live writes so the shadow never drifts.
--    pt-osc uses triggers; gh-ost tails the binlog instead.
CREATE TRIGGER users_ins AFTER INSERT ON users
  FOR EACH ROW REPLACE INTO users_shadow (id, name) VALUES (NEW.id, NEW.name);
-- (plus matching AFTER UPDATE / AFTER DELETE triggers)

-- 3. backfill existing rows in small PK ranges, throttling between batches
SET @lo = 0;
REPEAT
  INSERT IGNORE INTO users_shadow (id, name)
  SELECT id, name FROM users
  WHERE id > @lo AND id <= @lo + 1000;      -- one batch
  SET @lo = @lo + 1000;
  -- pause here if replica lag is rising
UNTIL @lo >= (SELECT MAX(id) FROM users) END REPEAT;

-- 4. atomic cut-over: both renames in one statement, sub-second
RENAME TABLE users TO users_old, users_shadow TO users;
DROP TABLE users_old;   -- once you are confident

Trade-offs

ConcernNaive ALTEROnline migration
DowntimeLocked for the whole rebuildNone — only a sub-second rename
Wall-clock timeFaster overallSlower, paced into batches
Disk overheadRoughly one extra copyFull second copy until swap
RollbackHard to interrupt mid-rebuildAbort any time before the swap
Write captureNot neededTriggers (pt-osc) or binlog (gh-ost)

Watch out for

Worked example

Say you need to add a NOT NULL column to a 50M-row table on a busy service.

A naive ALTER TABLE rebuilds the whole table under a lock and takes about 40 minutes — 40 minutes where every write to that table errors or hangs. That is a real outage.

The online migration instead backfills in 1,000-row batches, throttling between them to keep replica lag in check, finishing in roughly 2 hours of background work with zero downtime. Along the way it captures the ~3,000 live writes that landed during the copy and applies them to the shadow, so it stays in sync. The final RENAME TABLE swap completes in under a second. Slower on the clock, but the table never stopped serving.

Check yourself

Why is the original table able to keep serving reads and writes for the whole migration?

Mid-migration a user updates a row that was already copied. What keeps the shadow correct?