An index is the book's table of contents — it lets the database jump straight to a row instead of reading every page.
Imagine a phone book where the names are in no particular order. To find one person, you'd have to read every entry from the first page to the last. That is what a database does when it runs a sequential scan: it examines every row in the table.
An index is a separate, sorted copy of one column (or a few columns) with a pointer back to each row. Because it is sorted, the database can do an index seek — descend a tree to land near the answer in a handful of steps. The cost drops from looking at n rows to about log n.
SELECT * FROM users WHERE id = 42
You declare an index on the column(s) you filter or join on. The database builds and maintains a sorted structure — almost always a B-tree — whose leaves hold the indexed values together with a pointer (row id) to the full row.
-- Build a B-tree index on the column used in the WHERE clause
CREATE INDEX idx_users_id ON users (id);
-- This query can now seek instead of scan
SELECT * FROM users WHERE id = 42;
To resolve id = 42, the engine descends the tree instead of reading every row:
# Descend the B-tree from the root (pseudocode)
node = root
while node is not a leaf:
# each internal node holds sorted separator keys + child pointers
i = first child whose key-range can contain 42
node = node.children[i]
# at the leaf: binary-search the sorted keys
if 42 in node.keys:
row_ptr = node.pointer_for(42) # jump straight to the row
return fetch(row_ptr)
return NOT FOUND
# depth of the tree is ~log(n), so only a few nodes are touched
The payoff depends on selectivity — what fraction of rows match. A highly selective predicate (a unique id or email) returns very few rows, so the seek is a big win. A low-selectivity predicate (like is_active = true matching half the table) returns so many rows that scanning is often cheaper, and the planner may ignore the index.
| Dimension | Full scan | Index seek |
|---|---|---|
| Read cost (find a row) | O(n) — reads every row | O(log n) — descends the tree |
| Best for | Returning most of the table | Highly selective lookups |
| Write cost | None extra | Slows INSERT / UPDATE / DELETE — the index must be kept sorted |
| Storage | None extra | Extra disk for the sorted copy + pointers |
| Tiny tables | Fine — a scan is already fast | Rarely worth the overhead |
An index is a read-time accelerator paid for at write time and in storage. Add one when a column is searched often and is selective; skip it when the table is small, the column repeats heavily, or the table is write-dominated.
WHERE lower(email) = 'a@b.com' can't use a plain index on email — the predicate is non-sargable. Index the expression itself (CREATE INDEX ON users (lower(email))) or store the normalized value.(last_name, first_name) helps WHERE last_name = ? but not WHERE first_name = ? alone — you can't skip the leading column.LIKE 'smith%' can use a B-tree (it's a prefix range), but LIKE '%smith' cannot — there is no sorted prefix to descend to.A users table has 5 million rows and you run the login lookup:
SELECT id, password_hash FROM users WHERE email = 'ada@calc.io';
With no index, the planner has no sorted path to email, so it does a sequential scan — up to 5,000,000 row reads on every login. Because email is unique, the predicate is maximally selective, which is exactly the case an index serves best:
CREATE UNIQUE INDEX idx_users_email ON users (email);
Now the same query descends a B-tree of depth ~3, binary-searches the matching leaf, and follows one pointer to the row — roughly log&sub2;(5,000,000) ≈ 23 comparisons instead of millions. The unique constraint also guarantees at most one match, so the engine can stop the moment it lands. The write cost is real but modest: each sign-up now maintains one extra sorted structure.
1. You have an index on users (email). Which query can use it for a fast seek?
2. Which column is the worst candidate for its own index?