Blocking accept with no timeout

A server thread that blocks forever on a stalled connection is a thread you'll never get back.

The idea

A classic thread-per-connection server calls accept() to take a client socket, then read() to pull the request off the wire. If the socket has no read or accept timeout, a client that connects but never sends — a half-open or slowloris connection — leaves that worker thread parked in read() indefinitely.

With a fixed thread pool, a handful of these stalled connections silently consume every worker until the pool is empty and healthy clients can no longer be served. It's a denial of service with no crash and no log line. The fix is a socket read deadline, so a stalled connection is reaped and its thread returns to the pool.

See it work

Press play to watch it run.

How it works

The bug is the absence of a deadline. A blocking recv() on a socket with no timeout has no upper bound — it waits as long as the peer keeps the connection open but silent.

# BUG — the worker blocks here forever if the client never sends
conn, addr = srv.accept()      # accept() can also block with no timeout
data = conn.recv(1024)         # parks the thread indefinitely on a stalled peer
handle(data)
conn.close()

The fix gives the socket a read deadline. When the peer stays silent past the deadline, recv() raises, you close the connection, and the worker thread is freed back to the pool.

# FIX — bound the wait, reap the stall, return the thread
conn, addr = srv.accept()
conn.settimeout(5.0)           # or setsockopt SO_RCVTIMEO at the OS level
try:
    data = conn.recv(1024)
    handle(data)
except socket.timeout:
    pass                       # stalled peer — reap it
finally:
    conn.close()               # always release the socket and the thread

At the OS level this is setsockopt(SO_RCVTIMEO, …). Better still, non-blocking or async I/O (selectors, epoll, kqueue) drops the one-thread-per-connection model entirely, so a single thread multiplexes thousands of sockets and a stalled peer costs nothing.

Trade-offs

Aspect	Cost	Signal to watch
No timeout	Threads leak on every stalled connection	Pool usage climbing while CPU sits idle
Read timeout	Reaps stalls, but may cut slow-but-legitimate clients early	Rising count of timeout-triggered closes
Thread-per-connection	Simple to reason about, but bounded by pool size	Free workers trending toward zero
Non-blocking I/O	Scales to many idle sockets, but more complex code	Event-loop lag and ready-set size
Idle / connection caps	Defense in depth; rejects abusers, adds tuning	Per-IP connection counts and reject rate

Watch out for

Default sockets often have no timeout at all — the blocking behaviour is opt-out, not opt-in, so you have to set it deliberately.
Slowloris-style attacks hold many connections open and trickle bytes (or none) specifically to pin your workers.
The failure is silent: no crash, no exception, no log — the pool just drains and new clients hang.
Too aggressive a timeout drops legitimately slow clients on weak networks or large uploads.
Timing out read but not the connect / accept path, or forgetting to actually close and reap the socket on timeout, leaves the leak in place.

Worked example

Picture a server with a 50-thread pool fronting real users. An attacker opens 50 slowloris connections that complete the TCP handshake and then send nothing. Each recv() blocks with no deadline, so one by one all 50 workers park forever. Free threads hit zero; the next legitimate user is queued or refused. CPU is near idle, nothing crashes, and no log fires — the same drain-to-zero you can step through in the animation above.

Now add conn.settimeout(5.0). Five seconds after each stalled recv() starts, it raises socket.timeout, the connection is closed, and that worker returns to the pool. The 50 attacker sockets churn harmlessly in and out, free threads stay healthy, and real users keep getting served — phase B of the visual.

Check yourself

1. Your thread-per-connection server stops responding to new clients. CPU is near idle, nothing has crashed, and there are no errors in the logs. What's the most likely cause?

2. You add a read deadline to reap stalled sockets. Which detail matters most so the thread is actually recovered?