A table is sorted by its primary key. To find rows by any other column quickly, you keep a second little sorted book that points back to them.
A table is physically organized by its primary key — rows are kept in id order, so finding a row by id is a quick jump. But a query like "all users in the eng department" filters on a non-primary column. With nothing to help, the engine reads every row and checks its dept: a full scan, O(n).
A secondary index is an auxiliary sorted structure that maps a column value to the primary keys that have it: dept → [ids]. Now the engine looks up the value, gets the matching ids straight away, and fetches just those rows. The index is maintained on every write that touches the indexed column, and consulted on every read that filters by it.
The index is just a sorted map kept beside the table. Two operations matter: maintain it whenever an indexed column is written, and look it up when a read filters on that column. A covering index also stores the columns the query returns, so the read never touches the base table; a non-covering index gives back only ids and must fetch each row.
# maintain on every write that touches the indexed column
def write(id, old_dept, new_dept):
if old_dept is not None:
index[old_dept].remove(id) # leave the stale bucket
index[new_dept].add(id) # join the new bucket, kept sorted
# read: filter by dept
def find_by_dept(dept):
ids = index.get(dept, []) # one lookup, not a full scan
if COVERING: # index already holds the wanted columns
return [(i, index_cols[i]) for i in ids]
else: # non-covering: one base fetch per id
return [base_table[i] for i in ids] # N back-trips
The lookup cost is roughly the depth of the index plus the number of matches. The hidden cost lives in the non-covering branch: one base-table fetch per matching id — read amplification that a covering index removes by carrying the answer itself.
| Choice | Buys you | Costs you |
|---|---|---|
| Add a secondary index | Filter reads on that column go from O(n) to a lookup | Every write to the column updates the index too; extra storage |
| Covering index | Read answered from the index alone, no base fetches | More storage, more to maintain per write |
| Non-covering index | Smaller, cheaper to keep current | One base-table fetch per matching row — read amplification |
| High-cardinality column | Each value selects very few rows, big speedup | Low-cardinality columns barely beat a full scan |
(dept, name) won't speed a query that filters on name alone.Query: SELECT name FROM users WHERE dept = 'eng'. With a secondary index on dept, the engine looks up the eng bucket and gets [1, 3] at once — rows 2 and 4 are never read. If the index is non-covering, it then fetches base rows 1 and 3 to read their name: two extra back-trips, one per match. If the index is covering — it also stores name — the bucket already holds (1, ana) and (3, cy), so the answer returns with zero base fetches. Same lookup; covering trades storage to skip the read amplification.
A non-covering index on dept finds 500 matching ids for a query. What extra work happens before the rows are returned?
Coach note: a non-covering index returns only primary keys, so every match needs a trip back to the base table — that's the read amplification a covering index removes. Take another pass if covering vs non-covering still blurs together.