Split the file into chunks so a dropped connection costs you one chunk, not the whole upload.
Sending a 40 MB file as one giant request is fragile: if the network blips at 99%, the whole thing fails and you start over from byte zero. On a flaky connection, you might never finish.
Instead, slice the file into fixed-size chunks and upload them one at a time. The server tracks how many bytes it has safely committed. When the connection drops, the client just asks "how far did you get?" and resumes from the next chunk — not from the beginning. This is the core of multipart upload and the tus resumable protocol.
Three ideas carry the whole protocol. First, an upload session id ties all the chunks of one file together. Second, every chunk is addressed by its byte offset, so a PUT at a given offset is idempotent — sending the same chunk twice writes the same bytes and is harmless. Third, the server tracks a single committed offset: the high-water mark of bytes it has durably stored.
Resume is then almost free. The client asks the server for its committed offset (a HEAD request), and continues uploading from exactly there. No re-sending of bytes the server already has.
POST /uploads # create session
-> 201 {upload_id, chunk_size: 5*MB}
PUT /uploads/{id}?offset=0 (chunk 0) # idempotent by offset
-> 200 {committed: 1*CHUNK}
PUT /uploads/{id}?offset=1*CHUNK (chunk 1)
-> 200 {committed: 2*CHUNK}
... chunks 2, 3 ack ... # committed = 4*CHUNK
PUT /uploads/{id}?offset=4*CHUNK (chunk 5) -- connection drops --
# ---- resume ----
HEAD /uploads/{id} # ask: how far did you get?
-> 200 {committed: 4*CHUNK} # server still has 20 MB
PUT /uploads/{id}?offset=4*CHUNK (chunk 5) # retry from here, not 0
... chunks 6, 7 ack ... # committed = 8*CHUNK = 40 MB
POST /uploads/{id}/complete # assemble + finalize
-> 200 {etag, size: 40*MB}
Note the retry uses the same offset that failed: because the write is idempotent, re-doing chunk 5 is safe even if part of it had reached the server.
Chunk size is the dial you tune. Smaller chunks resume more finely but cost more round trips; larger chunks are efficient but waste more work on each failed chunk.
| Chunk size | Requests for 40 MB | Re-uploaded on a drop | Note |
|---|---|---|---|
| 1 MB | 40 + overhead | ≤ 1 MB | Fine resume, more HTTP & checksum overhead |
| 5 MB | 8 + overhead | ≤ 5 MB | Common default (S3 multipart minimum) |
| 25 MB | 2 + overhead | ≤ 25 MB | Few requests, but a drop wastes a lot |
| Per-chunk checksum | +1 hash / chunk | — | CPU cost buys corruption detection on retry |
PUT appends instead of writing-at-offset, a duplicated chunk corrupts the file. Address every chunk by offset so re-sending is a no-op.k starts at k * CHUNK. Mixing inclusive/exclusive ranges re-uploads or skips a chunk — verify offset == committed before each PUT.A 40 MB file in 8 chunks of 5 MB. Chunks 0–3 upload and ack, so the server's committed offset is 4 × 5 MB = 20 MB. The connection drops while chunk 5 (offset 20 MB) is in flight.
Naive single PUT failing here: re-upload all 40 MB
Resumable, drop after 4 chunks: re-upload only 20 MB
HEAD /uploads/{id} -> committed: 20 MB (= 4 chunks)
committed = 20 MB -> 20 MB / 5 MB = 4 chunks done (indices 0..3)
next chunk index = 4 (the 5th chunk of 8)
resume offset = 20 MB -> re-PUT the chunk at offset 20 MB
Watch the off-by-one: chunk index 4 is the fifth chunk. With 20 MB committed, you resume at offset 20 MB and re-send at most half the file — the dropped connection cost you one chunk, not the whole upload.
The connection drops after the server has committed 20 MB of a 40 MB file (5 MB chunks). Where does the upload resume from?
Why must a chunk PUT at a given offset be idempotent?