Database internals

How databases physically execute queries, like joining two tables together efficiently.

The idea

A SQL JOIN doesn't just happen by magic. If Table A has 1000 rows and Table B has 1000 rows, comparing every row to every other row (Nested Loop Join) takes 1,000,000 operations.

Instead, databases often use a Hash Join. First, they scan the smaller table and build an in-memory Hash Table using the join column as the key. Then, they scan the larger table (the Probe Phase) and simply look up each row in the Hash Table. This drops the time complexity from O(A × B) down to O(A + B)!

Query: SELECT * FROM Users JOIN Orders ON Users.id = Orders.user_id

How it works (Hash Join)

def hash_join(table_a, table_b, join_col_a, join_col_b):
    # Phase 1: Build Hash Table on the smaller table (A)
    hash_map = {}
    for row_a in table_a:
        key = row_a[join_col_a]
        if key not in hash_map:
            hash_map[key] = []
        hash_map[key].append(row_a)
        
    # Phase 2: Probe with the larger table (B)
    results = []
    for row_b in table_b:
        key = row_b[join_col_b]
        if key in hash_map:
            for matched_row in hash_map[key]:
                results.append(matched_row + row_b)
                
    return results