Streaming a download

Pour the river through a cup, a sip at a time — never try to hold the whole river at once.

The idea

You're serving a 2 GB file to a client. The tempting code reads the whole file into memory, then writes it to the socket. That works on your laptop with one user. Under load, every concurrent download holds its full file in RAM at once, and the server runs out of memory.

Streaming reads the file in small, fixed-size chunks and forwards each chunk to the client before reading the next. Memory stays flat and small — one chunk at a time — no matter how big the file or how many people download it. The trade is a little more code for a server that doesn't fall over.

File on disk Server pipe Client Server memory held 0 MB held
Pick a mode, then press Play to serve a 6-chunk file.

How it works

You open the file as a read handle and loop: read one chunk into a small reusable buffer, write that chunk to the response, repeat until the read returns nothing. The buffer is the only memory you hold — typically a few kilobytes — so peak memory is independent of file size.

CHUNK = 64 * 1024  # 64 KB reusable buffer

def stream_file(path, response):
    with open(path, "rb") as f:
        while True:
            chunk = f.read(CHUNK)   # read at most CHUNK bytes
            if not chunk:           # EOF: read returned empty
                break
            response.write(chunk)   # forward, then let it be freed
    response.end()
# Peak memory ~= one CHUNK, regardless of file size.

Most web frameworks expose this as a generator or a file-stream response. The key is that you never build a single object holding all the bytes.

Cost

SignalLoad allStream
Peak memory per requestO(file size)O(chunk size)
Memory under N concurrentN × file sizeN × chunk size
Time to first byteAfter full readAfter first chunk
Code complexityLowerSlightly higher

Watch out for

Worked example

Suppose 200 users each download a 1 GB video at once. Load-all needs roughly 200 GB of RAM held at the same time — impossible on a normal box, so the server crashes. Streaming with a 64 KB buffer needs about 200 × 64 KB ≈ 13 MB total. Same files, same users, but one design fits in memory and the other doesn't.

Check yourself

You stream the file but collect every chunk into a list, then b"".join(chunks) at the end. Does peak memory improve?