Blocking accept with no timeout

A server thread that blocks forever on a stalled connection is a thread you'll never get back.

The idea

A classic thread-per-connection server calls accept() to take a client socket, then read() to pull the request off the wire. If the socket has no read or accept timeout, a client that connects but never sends — a half-open or slowloris connection — leaves that worker thread parked in read() indefinitely.

With a fixed thread pool, a handful of these stalled connections silently consume every worker until the pool is empty and healthy clients can no longer be served. It's a denial of service with no crash and no log line. The fix is a socket read deadline, so a stalled connection is reaped and its thread returns to the pool.

See it work

Press play to watch it run.

How it works

The bug is the absence of a deadline. A blocking recv() on a socket with no timeout has no upper bound — it waits as long as the peer keeps the connection open but silent.

# BUG — the worker blocks here forever if the client never sends
conn, addr = srv.accept()      # accept() can also block with no timeout
data = conn.recv(1024)         # parks the thread indefinitely on a stalled peer
handle(data)
conn.close()

The fix gives the socket a read deadline. When the peer stays silent past the deadline, recv() raises, you close the connection, and the worker thread is freed back to the pool.

# FIX — bound the wait, reap the stall, return the thread
conn, addr = srv.accept()
conn.settimeout(5.0)           # or setsockopt SO_RCVTIMEO at the OS level
try:
    data = conn.recv(1024)
    handle(data)
except socket.timeout:
    pass                       # stalled peer — reap it
finally:
    conn.close()               # always release the socket and the thread

At the OS level this is setsockopt(SO_RCVTIMEO, …). Better still, non-blocking or async I/O (selectors, epoll, kqueue) drops the one-thread-per-connection model entirely, so a single thread multiplexes thousands of sockets and a stalled peer costs nothing.

Trade-offs

AspectCostSignal to watch
No timeoutThreads leak on every stalled connectionPool usage climbing while CPU sits idle
Read timeoutReaps stalls, but may cut slow-but-legitimate clients earlyRising count of timeout-triggered closes
Thread-per-connectionSimple to reason about, but bounded by pool sizeFree workers trending toward zero
Non-blocking I/OScales to many idle sockets, but more complex codeEvent-loop lag and ready-set size
Idle / connection capsDefense in depth; rejects abusers, adds tuningPer-IP connection counts and reject rate

Watch out for

Worked example

Picture a server with a 50-thread pool fronting real users. An attacker opens 50 slowloris connections that complete the TCP handshake and then send nothing. Each recv() blocks with no deadline, so one by one all 50 workers park forever. Free threads hit zero; the next legitimate user is queued or refused. CPU is near idle, nothing crashes, and no log fires — the same drain-to-zero you can step through in the animation above.

Now add conn.settimeout(5.0). Five seconds after each stalled recv() starts, it raises socket.timeout, the connection is closed, and that worker returns to the pool. The 50 attacker sockets churn harmlessly in and out, free threads stay healthy, and real users keep getting served — phase B of the visual.

Check yourself

1. Your thread-per-connection server stops responding to new clients. CPU is near idle, nothing has crashed, and there are no errors in the logs. What's the most likely cause?

2. You add a read deadline to reap stalled sockets. Which detail matters most so the thread is actually recovered?