Migrating a table without locking it

Build a copy of the table with the new shape on the side, keep it in sync while it fills, then swap the two in a blink.

The idea

Running a plain ALTER TABLE on a huge table can hold a lock for minutes or hours, and during that window every read and write to that table is stuck waiting. For a busy production table, that is an outage.

The online approach (the gh-ost / pt-online-schema-change pattern) never blocks the live table. It creates a shadow table with the new schema, copies existing rows over in small batches, captures live writes happening in the meantime so the shadow stays current, and then does a single instant rename to swap them. The original table serves traffic the entire time.

See it work

The original table is live and serving traffic. Press Play to migrate it with no lock.

How it works

The tool builds the shadow table, attaches a way to capture every live write, then loops copying rows in primary-key ranges so each batch is small and quick. Once the shadow has caught up, a single atomic rename inside one statement makes the swap appear instant to clients.

-- 1. shadow table = same shape as original, then the change we want
CREATE TABLE users_shadow LIKE users;
ALTER TABLE  users_shadow ADD COLUMN status VARCHAR(16) NOT NULL DEFAULT 'active';

-- 2. capture live writes so the shadow never drifts.
--    pt-osc uses triggers; gh-ost tails the binlog instead.
CREATE TRIGGER users_ins AFTER INSERT ON users
  FOR EACH ROW REPLACE INTO users_shadow (id, name) VALUES (NEW.id, NEW.name);
-- (plus matching AFTER UPDATE / AFTER DELETE triggers)

-- 3. backfill existing rows in small PK ranges, throttling between batches
SET @lo = 0;
REPEAT
  INSERT IGNORE INTO users_shadow (id, name)
  SELECT id, name FROM users
  WHERE id > @lo AND id <= @lo + 1000;      -- one batch
  SET @lo = @lo + 1000;
  -- pause here if replica lag is rising
UNTIL @lo >= (SELECT MAX(id) FROM users) END REPEAT;

-- 4. atomic cut-over: both renames in one statement, sub-second
RENAME TABLE users TO users_old, users_shadow TO users;
DROP TABLE users_old;   -- once you are confident

Trade-offs

Concern	Naive ALTER	Online migration
Downtime	Locked for the whole rebuild	None — only a sub-second rename
Wall-clock time	Faster overall	Slower, paced into batches
Disk overhead	Roughly one extra copy	Full second copy until swap
Rollback	Hard to interrupt mid-rebuild	Abort any time before the swap
Write capture	Not needed	Triggers (pt-osc) or binlog (gh-ost)

Watch out for

It roughly doubles disk usage for the table while the copy exists — make sure you have headroom before starting.
Foreign keys complicate the rename: child tables point at the old name, so the tooling needs special handling (or you drop and recreate constraints).
A long backfill drives writes onto your replicas and can add replication lag; throttle batches and pause when lag climbs to protect production.
The cut-over rename still needs a brief metadata lock; a long-running query holding that table can stall the swap.
Watch for unique-key collisions during backfill when live writes and copied rows race — use INSERT IGNORE / REPLACE so the freshest version wins.

Worked example

Say you need to add a NOT NULL column to a 50M-row table on a busy service.

A naive ALTER TABLE rebuilds the whole table under a lock and takes about 40 minutes — 40 minutes where every write to that table errors or hangs. That is a real outage.

The online migration instead backfills in 1,000-row batches, throttling between them to keep replica lag in check, finishing in roughly 2 hours of background work with zero downtime. Along the way it captures the ~3,000 live writes that landed during the copy and applies them to the shadow, so it stays in sync. The final RENAME TABLE swap completes in under a second. Slower on the clock, but the table never stopped serving.

Check yourself

Why is the original table able to keep serving reads and writes for the whole migration?

Mid-migration a user updates a row that was already copied. What keeps the shadow correct?