The same query can be instant or painfully slow — it all turns on whether the planner can take a shortcut to your rows instead of reading the whole table.
When you send SQL to a database, you describe what you want, not how to get it. A component called the query planner decides the how — it picks an access path for reaching your rows.
With no help, the planner reads every row and checks each one: a sequential scan, whose cost grows with the whole table. If a useful index exists, it can instead seek straight to the matching rows. Identical SQL, wildly different speed — the difference is the path the planner chose.
SELECT * FROM orders WHERE customer_id = 42;
Every database has a planner (or optimizer). It estimates a numeric cost for each candidate plan — roughly how many rows it expects to touch — and runs the cheapest one. You can see its decision with EXPLAIN.
Cost is driven by selectivity: how small a fraction of the table the predicate keeps. customer_id = 42 might match a handful of rows out of millions. A sequential scan still reads all N rows to find them; an index scan walks a small B-tree (O(log n)) and fetches only the matches. The planner compares the estimates and picks the cheaper path.
-- Before: nothing to seek with, so it reads every row
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Seq Scan on orders (cost=0.00..1850.00 rows=3 width=24)
Filter: (customer_id = 42)
-- rows removed by filter: ~99,997
-- Add a B-tree index on the filtered column
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- After: it seeks straight to the matches
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_customer on orders
(cost=0.42..12.10 rows=3 width=24)
Index Cond: (customer_id = 42)
The estimated cost dropped from ~1850 to ~12 — the planner now expects to touch a tiny fraction of the table.
| Aspect | Seq scan | Index scan |
|---|---|---|
| Rows examined | O(n) — reads the whole table | O(log n + k) — descend tree, fetch k matches |
| Best when | Predicate matches a large share of rows, or table is tiny | Predicate is selective (few matching rows) |
| Write cost | None added | Every insert / update / delete must also maintain the index |
| Storage | None added | The index is a separate on-disk structure to store |
| Risk | Predictable but slow at scale | Planner can mis-estimate with stale stats and pick the wrong path |
An index is not free: it speeds reads but taxes writes and uses storage. Index the columns you actually filter and join on — not every column.
ANALYZE — outdated stats lead to a bad plan, e.g. a seq scan where a seek would win.WHERE LOWER(email) = 'a@b.com' or WHERE customer_id + 0 = 42 — hides it from the index and forces a scan. Keep the column bare on one side, or build a matching expression index.bigint to a string (WHERE customer_id = '42') can make the engine cast the column instead of the literal, disabling the index. Match the literal's type to the column.SELECT * defeating a covering index. An index that holds only the filter column still needs a heap lookup per match to fetch other columns. Selecting just the columns you need can let an index cover the query and skip the heap.IN (...)) instead of many round trips.A nightly report runs SELECT * FROM orders WHERE customer_id = 42 over a 100,000-row table and takes ~420 ms. You check why:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Seq Scan on orders (cost=0.00..1850.00 rows=3 width=24)
(actual time=0.31..418.7 rows=3 loops=1)
Filter: (customer_id = 42)
Rows Removed by Filter: 99997
Execution Time: 419.4 ms
The plan is a Seq Scan that reads 100,000 rows to return 3. The predicate is highly selective, so an index should pay off. You add one and refresh stats:
CREATE INDEX idx_orders_customer ON orders (customer_id);
ANALYZE orders;
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_customer on orders
(cost=0.42..12.10 rows=3 width=24)
(actual time=0.04..0.07 rows=3 loops=1)
Index Cond: (customer_id = 42)
Execution Time: 0.12 ms
The plan flipped from Seq Scan to Index Scan, rows examined fell from 100,000 to ~3, and latency dropped from ~420 ms to under 1 ms — a roughly 3,000× speedup from one well-chosen index.
1. Your filter is WHERE LOWER(email) = 'sam@x.com' and you already have an index on email. The query still does a seq scan. Why?
2. A query matches 90% of the rows in a large table. The planner ignores your index and does a seq scan anyway. Is that a bug?