Cut a giant file into chunks, send each one on its own, then ask the server to glue them back together.
Trying to push a 5 GB video over the network as one request is fragile. One dropped packet near the end and you start the whole thing over. Multipart upload splits the object into independent parts (chunks). Each part is uploaded on its own request and the server hands back a small receipt called an ETag.
Because the parts are independent, they can travel in parallel and in any order, a flaky connection only forces you to retry the one failed part rather than the whole file, and an interrupted upload is resumable. When every part has landed, you complete the upload by sending the ordered list of ETags, and the server concatenates the parts into one final object.
There are three logical calls: open an upload, send each part, then complete. The upload ID ties the parts together, and each part comes back with an ETag you must keep. Only complete turns the scattered parts into a real object.
# 1) Initiate — the server returns an upload ID that ties the parts together
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
upload_id = mpu["UploadId"]
parts = []
for n, chunk in enumerate(split(file, part_size=100 * 1024 * 1024), start=1):
# 2) Upload each part independently. Parts can go in parallel / any order.
# If this one fails, retry JUST this part — the others are untouched.
resp = retry(lambda: s3.upload_part(
Bucket=bucket, Key=key, UploadId=upload_id,
PartNumber=n, Body=chunk,
))
parts.append({"PartNumber": n, "ETag": resp["ETag"]})
# 3) Complete — send the ORDERED list of ETags; server concatenates the parts
s3.complete_multipart_upload(
Bucket=bucket, Key=key, UploadId=upload_id,
MultipartUpload={"Parts": sorted(parts, key=lambda p: p["PartNumber"])},
)
# On any unrecoverable failure, free the orphaned parts so they stop costing money:
# s3.abort_multipart_upload(Bucket=bucket, Key=key, UploadId=upload_id)
| What | The trade-off |
|---|---|
| Per-part overhead | Each part is its own request, and providers set a minimum part size (often ~5 MB), so tiny files gain nothing from splitting. |
| Parallelism | Independent parts upload concurrently, so wall-clock time drops roughly with the number of parallel connections. |
| Retry cost | A failure re-sends one part (e.g. 100 MB) instead of the whole 5 GB file — the main reason multipart exists. |
| Orphaned parts | Parts uploaded but never completed or aborted sit in storage and bill you until removed. |
| Completion latency | complete makes the server concatenate parts into one object — usually fast, but not instant for very large objects. |
| Signal | Incomplete multipart uploads piling up in a bucket is a classic hidden bill — add a lifecycle rule to abort uploads older than N days. |
abort_multipart_upload on give-up, and add a lifecycle rule as a backstop.complete.PartNumber. A scrambled list produces a corrupt object, not an error you'll notice immediately.complete succeeds.You're uploading a 5 GB video over a flaky hotel connection, chunked into 100 MB parts — that's 50 parts. You initiate once and get an upload ID. Parts stream up in parallel, each returning an ETag you store alongside its part number.
Part 37 fails mid-flight. Because parts are independent, you retry only part 37 — the other 49 are already safely uploaded, so you re-send 100 MB, not 5 GB. Once all 50 ETags are collected, you call complete with the parts sorted by number, and the server stitches them into the final video.
Had the upload been abandoned instead, those 49 stored parts would keep billing you until an abort (or a lifecycle rule) cleaned them up.
Part 37 of a 50-part upload fails. What's the cheapest correct fix?
Every part returned a 200 and an ETag, but you never called complete. Where's the object?