Database Aggregation

Turning millions of rows into a single summary table using math.

The idea

If you run an e-commerce site, your orders table might have millions of rows. If your CEO asks "What is the total revenue per country?", you could download all 10 million rows to your Python server and loop through them to do the math. But that is incredibly slow and wastes network bandwidth. Instead, we use Database Aggregation to push the math down to the database engine itself.

Step 1: The Raw Data. We have an orders table with country codes and purchase amounts.

How it works (GROUP BY)

In SQL, the GROUP BY clause tells the database to bundle rows together based on a shared column (like 'country'). Once bundled, you apply an Aggregate Function—like SUM(), AVG(), COUNT(), or MAX()—to compress the bundle down to a single row.

# The inefficient way (Application side):
orders = db.query("SELECT country, amount FROM orders") # Downloads 10M rows!
totals = {}
for order in orders:
    totals[order.country] += order.amount 

# The correct way (Database Aggregation):
result = db.query("""
    SELECT country, SUM(amount) as total_revenue
    FROM orders
    GROUP BY country;
""") # Downloads exactly 195 rows. Instant.

Cost

While faster than downloading the raw data, doing SUM() over 10 million rows on the fly still burns CPU. If this is a dashboard that users refresh constantly, the database will struggle. The solution is Materialized Views or pre-aggregating the data in background jobs (e.g. calculating yesterday's totals exactly once at midnight).

Watch out for

The HAVING clause: What if you only want to see countries with revenue > $10,000? You cannot use a WHERE clause, because WHERE filters raw rows before aggregation. To filter on the aggregated sum, you must use HAVING SUM(amount) > 10000.