Query top n

Keep only the best few you've seen so far, and let the weakest of them guard the door.

The idea

You have a huge stream of values — millions of rows — and you only want the top n. You don't need to sort everything. You only need to remember the best n seen so far, in a tiny min-heap of size n.

The smallest of those n sits at the root. That root is the bar to beat: any new value larger than it deserves a spot, so you evict the root and push the newcomer. Anything smaller can't crack the top n, so you discard it on the spot. When the stream ends, the heap holds the top n.

Press play to stream values into the heap.

How it works

Python's heapq is a min-heap: heap[0] is always the smallest item. That's exactly the value we want guarding the door — the weakest of our current best.

import heapq

def top_n(stream, n):
    heap = []                      # a min-heap; heap[0] is the smallest kept value
    for x in stream:
        if len(heap) < n:
            heapq.heappush(heap, x)        # not full yet — just keep it
        elif x > heap[0]:                  # x beats the weakest we're keeping
            heapq.heapreplace(heap, x)     # pop the min, push x, sift — in one step
        # else: x can't crack the top n — discard it
    return heap                    # the top n (UNSORTED — sort if you need ranking)

# ranked, highest first:
#   sorted(top_n(stream, n), reverse=True)

The whole trick is that heap[0] — the current minimum — is the cheapest thing to compare against and the cheapest thing to evict. heapreplace does the pop-then-push in a single sift, so each accepted value costs one O(log n) rebalance.

Cost

AspectBounded min-heapFull sort then slice
Time (m items)O(m log n)O(m log m)
SpaceO(n)O(m)
Held in memoryOnly n valuesAll m values

When n << m — top 10 out of 10 million — the heap is a runaway win: log n is tiny and constant while log m grows with the stream, and you never hold more than n items at once. That O(n) footprint is what lets the pattern run over a stream that doesn't fit in memory.

Watch out for

Worked example

You're scanning a 10-million-row table for the top 3 highest-scoring rows — the database's ORDER BY score DESC LIMIT 3. Sorting all 10M rows just to keep 3 would be wasteful. Instead, sweep the rows once through a min-heap of size 3.

Say scores arrive as 5, 9, 1, 8, 4, 12, 7. The first three fill the heap: {1, 5, 9}, root (the bar) is 1. Then 8 > 1 — evict 1, push 8, heap {5, 8, 9}, the bar rises to 5. Next 4 <= 5 — discard. Then 12 > 5 — evict 5, push 12, heap {8, 9, 12}, bar is 8. Finally 7 <= 8 — discard. The heap holds {8, 9, 12}; sorted descending that's the top 3: 12, 9, 8 — found in one pass, holding only three rows in memory the whole time.

Check yourself

You're keeping the top 3 in a min-heap that currently holds {8, 9, 12} (root = 8). A new value 10 arrives. What happens?

Your top-3 heap finishes holding {8, 9, 12}. A teammate asks for the results ranked highest first. What do you hand back?