Secondary indexes

A table is sorted by its primary key. To find rows by any other column quickly, you keep a second little sorted book that points back to them.

The idea

A table is physically organized by its primary key — rows are kept in id order, so finding a row by id is a quick jump. But a query like "all users in the eng department" filters on a non-primary column. With nothing to help, the engine reads every row and checks its dept: a full scan, O(n).

A secondary index is an auxiliary sorted structure that maps a column value to the primary keys that have it: dept → [ids]. Now the engine looks up the value, gets the matching ids straight away, and fetches just those rows. The index is maintained on every write that touches the indexed column, and consulted on every read that filters by it.

Rows are stored in id order — finding by id is a jump, finding by dept is not.

How it works

The index is just a sorted map kept beside the table. Two operations matter: maintain it whenever an indexed column is written, and look it up when a read filters on that column. A covering index also stores the columns the query returns, so the read never touches the base table; a non-covering index gives back only ids and must fetch each row.

# maintain on every write that touches the indexed column
def write(id, old_dept, new_dept):
    if old_dept is not None:
        index[old_dept].remove(id)   # leave the stale bucket
    index[new_dept].add(id)          # join the new bucket, kept sorted

# read: filter by dept
def find_by_dept(dept):
    ids = index.get(dept, [])        # one lookup, not a full scan
    if COVERING:                     # index already holds the wanted columns
        return [(i, index_cols[i]) for i in ids]
    else:                            # non-covering: one base fetch per id
        return [base_table[i] for i in ids]   # N back-trips

The lookup cost is roughly the depth of the index plus the number of matches. The hidden cost lives in the non-covering branch: one base-table fetch per matching id — read amplification that a covering index removes by carrying the answer itself.

Signals

Choice	Buys you	Costs you
Add a secondary index	Filter reads on that column go from `O(n)` to a lookup	Every write to the column updates the index too; extra storage
Covering index	Read answered from the index alone, no base fetches	More storage, more to maintain per write
Non-covering index	Smaller, cheaper to keep current	One base-table fetch per matching row — read amplification
High-cardinality column	Each value selects very few rows, big speedup	Low-cardinality columns barely beat a full scan

Watch out for

Every index taxes writes. An insert or update must maintain each index on the touched columns — more indexes, slower writes.
A non-covering index still pays per result. Matching a thousand ids means a thousand base-table fetches; covering removes those trips.
Low-cardinality indexes barely help. An index on a boolean or a three-value column returns much of the table — not far from a scan.
An index is only used if the query matches its leading column and order. A composite index on (dept, name) won't speed a query that filters on name alone.
Over-indexing. Indexing "just in case" balloons storage and write cost for reads that never come — index for queries you actually run.

Worked example

Query: SELECT name FROM users WHERE dept = 'eng'. With a secondary index on dept, the engine looks up the eng bucket and gets [1, 3] at once — rows 2 and 4 are never read. If the index is non-covering, it then fetches base rows 1 and 3 to read their name: two extra back-trips, one per match. If the index is covering — it also stores name — the bucket already holds (1, ana) and (3, cy), so the answer returns with zero base fetches. Same lookup; covering trades storage to skip the read amplification.

Check yourself

A non-covering index on dept finds 500 matching ids for a query. What extra work happens before the rows are returned?

Coach note: a non-covering index returns only primary keys, so every match needs a trip back to the base table — that's the read amplification a covering index removes. Take another pass if covering vs non-covering still blurs together.