The order tasks are picked up is not the order they finish — a quick errand started later can beat a long one started first.
A thread pool has a fixed number of workers and a queue of tasks. Workers pull tasks in queue order, but each task takes a different amount of time. So start order (when a worker picks a task up) and completion order (when it's done) are two different sequences — and that's the whole subtlety.
Two ideas follow. First, results come back out of order, so anything that needs to recombine them must key by task id, not arrival. Second, a few slow tasks can hog every worker and make fast tasks wait behind them — head-of-line blocking. Knowing the difference between “submitted”, “started”, and “completed” is what lets you reason about latency at all.
Each worker loops: take the next task, run it, publish the result keyed by task id, repeat. We simulate a discrete clock: at each tick a free worker grabs the front of the queue; a task finishes when its remaining time hits zero. The completed list shows results landing in duration order, not submission order.
# A worker just loops over the shared queue
def worker(queue, results):
while True:
task = queue.get() # FIFO: start order = submission order
out = task.run() # but run time varies per task
results[task.id] = out # key by id — completion order differs
queue.task_done()
# The caller must NOT assume results come back in submission order.
# Re-associate by id, e.g. with futures:
futures = {pool.submit(t.run): t.id for t in tasks}
for fut in as_completed(futures): # yields in COMPLETION order
results[futures[fut]] = fut.result()
If you instead iterate the futures in submission order and call .result(), you block on the first task even if later ones already finished — you've serialised the waiting.
| State | Meaning |
|---|---|
| Submitted | In the queue, no worker yet |
| Started | A worker picked it up (start order = queue order) |
| Completed | Result ready (order depends on duration) |
| Queue length growing | Arrival rate > throughput; add workers or shed load |
| Fast tasks high-latency | Head-of-line blocking behind slow tasks |
Submit tasks A(4), B(1), C(1), D(3), E(1) (numbers are run times) to 2 workers. Start order is A, B, then whoever frees up. B finishes at t=1 and that worker grabs C; C finishes at t=2 and grabs D. A is still running until t=4. So completion order is B, C, E, A, D — nothing like submission order A,B,C,D,E. The short tasks slip past the long one A, and total wall-clock time is 6 even though the work sums to 10, because two workers ran in parallel.
You submit tasks in order and read their futures in submission order, calling .result() on each. What happens?