Data modelling & indexing

Designing your schema and indexes around how the data is actually queried.

The idea

If you query a database without an index, it must perform a Full Table Scan, reading every single row to find matches (O(N) time). As data grows, this becomes painfully slow.

By adding an Index (like a B-Tree), the database creates a sorted side-structure mapping values to their exact disk locations. The query planner can now do an Index Seek, jumping straight to the row in O(log N) time. The tradeoff is that every Write must now also update the index.

Disk (Table Rows)
Cost: 30 Disk Reads (Full Scan)
Table has 30 rows randomly scattered on disk.

How it works (Query Planner)

# The DB optimizer chooses between a Scan and a Seek.

def execute_query(target_value):
    if has_index(target_value):
        # O(log N) - Fast B-Tree traversal
        row_id = btree_index.search(target_value)
        return disk.read(row_id)
    else:
        # O(N) - Slow sequential scan of entire disk
        for row in disk.read_all():
            if row.val == target_value:
                return row