Streaming a download

Pour the river through a cup, a sip at a time — never try to hold the whole river at once.

The idea

You're serving a 2 GB file to a client. The tempting code reads the whole file into memory, then writes it to the socket. That works on your laptop with one user. Under load, every concurrent download holds its full file in RAM at once, and the server runs out of memory.

Streaming reads the file in small, fixed-size chunks and forwards each chunk to the client before reading the next. Memory stays flat and small — one chunk at a time — no matter how big the file or how many people download it. The trade is a little more code for a server that doesn't fall over.

Mode:

Pick a mode, then press Play to serve a 6-chunk file.

How it works

You open the file as a read handle and loop: read one chunk into a small reusable buffer, write that chunk to the response, repeat until the read returns nothing. The buffer is the only memory you hold — typically a few kilobytes — so peak memory is independent of file size.

CHUNK = 64 * 1024  # 64 KB reusable buffer

def stream_file(path, response):
    with open(path, "rb") as f:
        while True:
            chunk = f.read(CHUNK)   # read at most CHUNK bytes
            if not chunk:           # EOF: read returned empty
                break
            response.write(chunk)   # forward, then let it be freed
    response.end()
# Peak memory ~= one CHUNK, regardless of file size.

Most web frameworks expose this as a generator or a file-stream response. The key is that you never build a single object holding all the bytes.

Cost

Signal	Load all	Stream
Peak memory per request	O(file size)	O(chunk size)
Memory under N concurrent	N × file size	N × chunk size
Time to first byte	After full read	After first chunk
Code complexity	Lower	Slightly higher

Watch out for

Reading the whole file with read() and no size argument — that is the load-all bug, just hidden in a one-liner.
Building a list of all chunks and joining at the end. You've streamed the read but still hold everything; memory is no better.
Forgetting backpressure: if the client is slow and you read faster than you send, chunks queue up in a buffer and memory grows anyway. Respect the writer's drain signal.
Not setting Content-Length (or using chunked transfer encoding) — some clients hang waiting to know when the body ends.
Leaking the file handle when the client disconnects mid-stream. Always close in a finally or use a context manager.

Worked example

Suppose 200 users each download a 1 GB video at once. Load-all needs roughly 200 GB of RAM held at the same time — impossible on a normal box, so the server crashes. Streaming with a 64 KB buffer needs about 200 × 64 KB ≈ 13 MB total. Same files, same users, but one design fits in memory and the other doesn't.

Check yourself

You stream the file but collect every chunk into a list, then b"".join(chunks) at the end. Does peak memory improve?