Finding the most recent event for every user in a log table.
Many systems use "append-only" tables. When a user updates their profile, you don't overwrite the old row; you insert a new row with a newer timestamp. This creates an audit log. But when you need to query "the current state of all users," you have a problem. You have duplicates. You need to group by user, but then extract ONLY the row with the maximum timestamp from each group. A standard GROUP BY struggles with this because it crushes the data down.
The modern, highly-performant way to solve this is using Window Functions (specifically ROW_NUMBER() OVER (...)). This allows the database to partition the data into logical groups, sort each group internally, and assign a rank (1, 2, 3...) to each row, without crushing the rows together like GROUP BY does.
-- Find the latest status for every user
WITH RankedLogs AS (
SELECT
user_id,
status,
timestamp,
-- Group by user, sort newest to oldest, assign a row number
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY timestamp DESC) as rn
FROM user_logs
)
-- Filter to only grab the #1 newest row per user!
SELECT user_id, status, timestamp
FROM RankedLogs
WHERE rn = 1;
Window functions require sorting. If you have 10 million rows, ORDER BY timestamp DESC is expensive. To make this query lightning fast, you must have a composite index on exactly the columns you are partitioning and sorting by: CREATE INDEX idx_user_time ON user_logs(user_id, timestamp DESC);
SELECT DISTINCT ON (user_id) user_id, status FROM user_logs ORDER BY user_id, timestamp DESC;. It is often slightly faster than Window Functions!