Secondary Index

Looking up data when you don't know the Primary Key.

The idea

Every table usually has a Primary Index on its ID column. Because it's a B-Tree, looking up ID=5 takes microseconds. But what if a user wants to log in using their email? The database can't use the Primary Index because it's sorted by ID, not email. It would have to scan every single row on disk (a Sequential Scan). To fix this, we build a Secondary Index: a completely separate B-Tree sorted by email.

Step 1: The Main Table is physically sorted by the Primary Key (ID).

How it works (The Extra Hop)

A Secondary Index is literally just a copy of the target column (email) and a pointer. But unlike the Primary Index, the pointer doesn't hold the row's data. Instead, it points to the Primary Key (or to the physical memory block). To run SELECT * FROM users WHERE email='x', the database searches the Secondary Index, finds the ID, and then does a second search on the Primary Index to get the actual row data.

-- This query is fast automatically (Primary Key lookup)
SELECT * FROM users WHERE id = 1042;

-- This query is slow (Sequential Scan of entire table)
SELECT * FROM users WHERE email = 'alice@example.com';

-- Creating a Secondary Index fixes it
CREATE INDEX idx_users_email ON users(email);
-- Now it takes O(log N) + O(log N) time!

Cost

Secondary Indexes aren't free. Every time you INSERT or UPDATE a row in the main table, the database must also update the Primary Index and every Secondary Index you created. If you have 10 indexes on a table, a single insert does 11 physical disk writes! This is why you should never just "index every column just in case".

Watch out for

Low Selectivity: If you put a secondary index on a boolean column like is_deleted, the index is useless. If 50% of your users are deleted, the B-Tree doesn't help narrow down the search. The database optimizer will just ignore the index and do a full table scan anyway.