Keep only the best few you've seen so far, and let the weakest of them guard the door.
You have a huge stream of values — millions of rows — and you only want the top n. You don't need to sort everything. You only need to remember the best n seen so far, in a tiny min-heap of size n.
The smallest of those n sits at the root. That root is the bar to beat: any new value larger than it deserves a spot, so you evict the root and push the newcomer. Anything smaller can't crack the top n, so you discard it on the spot. When the stream ends, the heap holds the top n.
Python's heapq is a min-heap: heap[0] is always the smallest item. That's exactly the value we want guarding the door — the weakest of our current best.
import heapq
def top_n(stream, n):
heap = [] # a min-heap; heap[0] is the smallest kept value
for x in stream:
if len(heap) < n:
heapq.heappush(heap, x) # not full yet — just keep it
elif x > heap[0]: # x beats the weakest we're keeping
heapq.heapreplace(heap, x) # pop the min, push x, sift — in one step
# else: x can't crack the top n — discard it
return heap # the top n (UNSORTED — sort if you need ranking)
# ranked, highest first:
# sorted(top_n(stream, n), reverse=True)
The whole trick is that heap[0] — the current minimum — is the cheapest thing to compare against and the cheapest thing to evict. heapreplace does the pop-then-push in a single sift, so each accepted value costs one O(log n) rebalance.
| Aspect | Bounded min-heap | Full sort then slice |
|---|---|---|
| Time (m items) | O(m log n) | O(m log m) |
| Space | O(n) | O(m) |
| Held in memory | Only n values | All m values |
When n << m — top 10 out of 10 million — the heap is a runaway win: log n is tiny and constant while log m grows with the stream, and you never hold more than n items at once. That O(n) footprint is what lets the pattern run over a stream that doesn't fit in memory.
sorted(heap, reverse=True) at the end.(score, tiebreak, row) tuples or pass a key, or you'll keep the wrong n.x > heap[0] is false, so an equal newcomer is discarded — you keep the one you saw first. If ties must resolve a specific way, fold a deterministic tiebreaker into the key.You're scanning a 10-million-row table for the top 3 highest-scoring rows — the database's ORDER BY score DESC LIMIT 3. Sorting all 10M rows just to keep 3 would be wasteful. Instead, sweep the rows once through a min-heap of size 3.
Say scores arrive as 5, 9, 1, 8, 4, 12, 7. The first three fill the heap: {1, 5, 9}, root (the bar) is 1. Then 8 > 1 — evict 1, push 8, heap {5, 8, 9}, the bar rises to 5. Next 4 <= 5 — discard. Then 12 > 5 — evict 5, push 12, heap {8, 9, 12}, bar is 8. Finally 7 <= 8 — discard. The heap holds {8, 9, 12}; sorted descending that's the top 3: 12, 9, 8 — found in one pass, holding only three rows in memory the whole time.
You're keeping the top 3 in a min-heap that currently holds {8, 9, 12} (root = 8). A new value 10 arrives. What happens?
Your top-3 heap finishes holding {8, 9, 12}. A teammate asks for the results ranked highest first. What do you hand back?