SQL query performance

The same query can be instant or painfully slow — it all turns on whether the planner can take a shortcut to your rows instead of reading the whole table.

The idea

When you send SQL to a database, you describe what you want, not how to get it. A component called the query planner decides the how — it picks an access path for reaching your rows.

With no help, the planner reads every row and checks each one: a sequential scan, whose cost grows with the whole table. If a useful index exists, it can instead seek straight to the matching rows. Identical SQL, wildly different speed — the difference is the path the planner chose.

See it work

SELECT * FROM orders WHERE customer_id = 42;

orders (id, customer_id, total)
no index on customer_id
Click play to run the query with a sequential scan.

How it works

Every database has a planner (or optimizer). It estimates a numeric cost for each candidate plan — roughly how many rows it expects to touch — and runs the cheapest one. You can see its decision with EXPLAIN.

Cost is driven by selectivity: how small a fraction of the table the predicate keeps. customer_id = 42 might match a handful of rows out of millions. A sequential scan still reads all N rows to find them; an index scan walks a small B-tree (O(log n)) and fetches only the matches. The planner compares the estimates and picks the cheaper path.

-- Before: nothing to seek with, so it reads every row
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
  Seq Scan on orders  (cost=0.00..1850.00 rows=3 width=24)
    Filter: (customer_id = 42)
    -- rows removed by filter: ~99,997

-- Add a B-tree index on the filtered column
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- After: it seeks straight to the matches
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
  Index Scan using idx_orders_customer on orders
        (cost=0.42..12.10 rows=3 width=24)
    Index Cond: (customer_id = 42)

The estimated cost dropped from ~1850 to ~12 — the planner now expects to touch a tiny fraction of the table.

Cost / trade-offs

AspectSeq scanIndex scan
Rows examinedO(n) — reads the whole tableO(log n + k) — descend tree, fetch k matches
Best whenPredicate matches a large share of rows, or table is tinyPredicate is selective (few matching rows)
Write costNone addedEvery insert / update / delete must also maintain the index
StorageNone addedThe index is a separate on-disk structure to store
RiskPredictable but slow at scalePlanner can mis-estimate with stale stats and pick the wrong path

An index is not free: it speeds reads but taxes writes and uses storage. Index the columns you actually filter and join on — not every column.

Watch out for

Worked example

A nightly report runs SELECT * FROM orders WHERE customer_id = 42 over a 100,000-row table and takes ~420 ms. You check why:

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
  Seq Scan on orders  (cost=0.00..1850.00 rows=3 width=24)
                      (actual time=0.31..418.7 rows=3 loops=1)
    Filter: (customer_id = 42)
    Rows Removed by Filter: 99997
  Execution Time: 419.4 ms

The plan is a Seq Scan that reads 100,000 rows to return 3. The predicate is highly selective, so an index should pay off. You add one and refresh stats:

CREATE INDEX idx_orders_customer ON orders (customer_id);
ANALYZE orders;

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
  Index Scan using idx_orders_customer on orders
        (cost=0.42..12.10 rows=3 width=24)
        (actual time=0.04..0.07 rows=3 loops=1)
    Index Cond: (customer_id = 42)
  Execution Time: 0.12 ms

The plan flipped from Seq Scan to Index Scan, rows examined fell from 100,000 to ~3, and latency dropped from ~420 ms to under 1 ms — a roughly 3,000× speedup from one well-chosen index.

Check yourself

1. Your filter is WHERE LOWER(email) = 'sam@x.com' and you already have an index on email. The query still does a seq scan. Why?

2. A query matches 90% of the rows in a large table. The planner ignores your index and does a seq scan anyway. Is that a bug?