Turning millions of rows into a single summary table using math.
If you run an e-commerce site, your orders table might have millions of rows. If your CEO asks "What is the total revenue per country?", you could download all 10 million rows to your Python server and loop through them to do the math. But that is incredibly slow and wastes network bandwidth. Instead, we use Database Aggregation to push the math down to the database engine itself.
In SQL, the GROUP BY clause tells the database to bundle rows together based on a shared column (like 'country'). Once bundled, you apply an Aggregate Function—like SUM(), AVG(), COUNT(), or MAX()—to compress the bundle down to a single row.
# The inefficient way (Application side):
orders = db.query("SELECT country, amount FROM orders") # Downloads 10M rows!
totals = {}
for order in orders:
totals[order.country] += order.amount
# The correct way (Database Aggregation):
result = db.query("""
SELECT country, SUM(amount) as total_revenue
FROM orders
GROUP BY country;
""") # Downloads exactly 195 rows. Instant.
While faster than downloading the raw data, doing SUM() over 10 million rows on the fly still burns CPU. If this is a dashboard that users refresh constantly, the database will struggle. The solution is Materialized Views or pre-aggregating the data in background jobs (e.g. calculating yesterday's totals exactly once at midnight).
WHERE clause, because WHERE filters raw rows before aggregation. To filter on the aggregated sum, you must use HAVING SUM(amount) > 10000.