Every write to the table is really two writes — the row, and the index entry that points at it.
A base table is keyed by its primary key. To find rows by some other column — say email — without scanning every row, you build a secondary index: a separate structure sorted by that column, where each entry maps email → pk.
The catch is that the index is a copy of part of the data, so it can drift out of sync. Every base-table write has to maintain it: an insert adds an entry, a delete removes one, and an update of the indexed column must remove the old entry and insert a new one — because the entry's sorted position changes. Get that delete-old step wrong and the index points at stale data.
The index is sorted by (indexed_column, pk) so lookups are a binary search. Every write touches both structures inside one transaction, so the index can never diverge from the table. The update case is the subtle one: the old entry can't be edited in place, because changing the value moves where it sorts — you must delete the old key and insert the new one.
def insert(pk, email, row):
table[pk] = row
index.insert((email, pk)) # new entry at sorted position
def update(pk, email_old, email_new, row):
table[pk] = row
if email_old != email_new:
index.delete((email_old, pk)) # remove stale entry
index.insert((email_new, pk)) # add new entry
# both under one transaction so the index never diverges
def delete(pk, email):
del table[pk]
index.delete((email, pk)) # drop the entry too
def find_by_email(email):
pk = index.seek(email) # O(log n) into the index
return table[pk] # then fetch the row by pk
| Operation | Work | Why |
|---|---|---|
| Find by email | O(log n) seek + 1 fetch | Binary-search the index, then read the row by pk — vs O(n) full scan with no index |
| Insert / delete row | 1 row write + 1 index write | The entry is added or removed alongside the row |
| Update indexed column | 1 row write + 2 index writes | Delete old entry, insert new one — write amplification |
| Storage | +1 sorted copy per index | The indexed column and pk are duplicated; more indexes means more disk |
find_by_email(old) still returns the row — an orphaned, wrong result. Always pair delete-old with insert-new.email → pk, the read still has to fetch the row by pk. A covering index that stores the needed columns avoids that second hop — at the cost of more storage.The table holds users keyed by pk, indexed by email. Row pk=7 has email = "a@x", so the index contains the entry ("a@x", 7). The user changes their address: update(7, "a@x", "b@x", row). The table row is rewritten, then the index does two ops — delete(("a@x", 7)) removes the stale entry, and insert(("b@x", 7)) places the new one at its sorted position. Now find_by_email("b@x") seeks to pk 7 and fetches the row. Had you skipped the delete, find_by_email("a@x") would still seek to pk 7 and return the row — a stale hit pointing at someone who no longer has that email.
You update the indexed column of a row from "a@x" to "b@x". What must happen to the index?
Coach note: the index is sorted by the value, so a new value belongs at a different position — you can't edit in place, and leaving the old entry behind makes lookups return stale rows.