SQL query performance

The same query can be instant or painfully slow — it all turns on whether the planner can take a shortcut to your rows instead of reading the whole table.

The idea

When you send SQL to a database, you describe what you want, not how to get it. A component called the query planner decides the how — it picks an access path for reaching your rows.

With no help, the planner reads every row and checks each one: a sequential scan, whose cost grows with the whole table. If a useful index exists, it can instead seek straight to the matching rows. Identical SQL, wildly different speed — the difference is the path the planner chose.

See it work

SELECT * FROM orders WHERE customer_id = 42;

no index on customer_id

Click play to run the query with a sequential scan.

How it works

Every database has a planner (or optimizer). It estimates a numeric cost for each candidate plan — roughly how many rows it expects to touch — and runs the cheapest one. You can see its decision with EXPLAIN.

Cost is driven by selectivity: how small a fraction of the table the predicate keeps. customer_id = 42 might match a handful of rows out of millions. A sequential scan still reads all N rows to find them; an index scan walks a small B-tree (O(log n)) and fetches only the matches. The planner compares the estimates and picks the cheaper path.

-- Before: nothing to seek with, so it reads every row
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
  Seq Scan on orders  (cost=0.00..1850.00 rows=3 width=24)
    Filter: (customer_id = 42)
    -- rows removed by filter: ~99,997

-- Add a B-tree index on the filtered column
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- After: it seeks straight to the matches
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
  Index Scan using idx_orders_customer on orders
        (cost=0.42..12.10 rows=3 width=24)
    Index Cond: (customer_id = 42)

The estimated cost dropped from ~1850 to ~12 — the planner now expects to touch a tiny fraction of the table.

Cost / trade-offs

Aspect	Seq scan	Index scan
Rows examined	`O(n)` — reads the whole table	`O(log n + k)` — descend tree, fetch k matches
Best when	Predicate matches a large share of rows, or table is tiny	Predicate is selective (few matching rows)
Write cost	None added	Every insert / update / delete must also maintain the index
Storage	None added	The index is a separate on-disk structure to store
Risk	Predictable but slow at scale	Planner can mis-estimate with stale stats and pick the wrong path

An index is not free: it speeds reads but taxes writes and uses storage. Index the columns you actually filter and join on — not every column.

Watch out for

Stale statistics. The planner estimates costs from sampled stats. After a big load, run ANALYZE — outdated stats lead to a bad plan, e.g. a seq scan where a seek would win.
Non-sargable predicates. Wrapping the column in a function or arithmetic — WHERE LOWER(email) = 'a@b.com' or WHERE customer_id + 0 = 42 — hides it from the index and forces a scan. Keep the column bare on one side, or build a matching expression index.
Implicit type casts. Comparing an indexed bigint to a string (WHERE customer_id = '42') can make the engine cast the column instead of the literal, disabling the index. Match the literal's type to the column.
SELECT * defeating a covering index. An index that holds only the filter column still needs a heap lookup per match to fetch other columns. Selecting just the columns you need can let an index cover the query and skip the heap.
N+1 queries. A fast single query run once per row in a loop is slow in aggregate. Fetch in one set-based query (a join or IN (...)) instead of many round trips.

Worked example

A nightly report runs SELECT * FROM orders WHERE customer_id = 42 over a 100,000-row table and takes ~420 ms. You check why:

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
  Seq Scan on orders  (cost=0.00..1850.00 rows=3 width=24)
                      (actual time=0.31..418.7 rows=3 loops=1)
    Filter: (customer_id = 42)
    Rows Removed by Filter: 99997
  Execution Time: 419.4 ms

The plan is a Seq Scan that reads 100,000 rows to return 3. The predicate is highly selective, so an index should pay off. You add one and refresh stats:

CREATE INDEX idx_orders_customer ON orders (customer_id);
ANALYZE orders;

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
  Index Scan using idx_orders_customer on orders
        (cost=0.42..12.10 rows=3 width=24)
        (actual time=0.04..0.07 rows=3 loops=1)
    Index Cond: (customer_id = 42)
  Execution Time: 0.12 ms

The plan flipped from Seq Scan to Index Scan, rows examined fell from 100,000 to ~3, and latency dropped from ~420 ms to under 1 ms — a roughly 3,000× speedup from one well-chosen index.

Check yourself

1. Your filter is WHERE LOWER(email) = 'sam@x.com' and you already have an index on email. The query still does a seq scan. Why?

2. A query matches 90% of the rows in a large table. The planner ignores your index and does a seq scan anyway. Is that a bug?