Multipart upload that never finished

A big file uploaded in pieces, then abandoned mid-flight — the pieces linger in the bucket, unseen but still billed.

The idea

To upload a large object, S3-style storage lets you split it into independent parts and send them under one UploadId. The object only becomes real when you call CompleteMultipartUpload, which stitches the parts together.

If the client crashes or quits before that final call, the parts that already landed are not assembled and not deleted — they become orphaned parts. A normal LIST or GET can't see them, yet they keep occupying storage and accruing cost until a lifecycle rule aborts the incomplete upload.

See it work

Press play to upload a 500 MB object as five parts.

How it works

You open the upload once to get an UploadId, push each part (numbered, at least 5 MiB except the last), then call complete with the list of parts and their ETags. If anything dies before complete, the landed parts stay — so you either retry-and-complete, or abort.

import boto3
s3 = boto3.client("s3")

# 1. Open the upload — this returns the UploadId you must keep.
mpu = s3.create_multipart_upload(Bucket="vault", Key="backup.tar")
upload_id = mpu["UploadId"]

parts = []
for i, chunk in enumerate(read_chunks("backup.tar", size=100 * 1024 * 1024), start=1):
    # Each part is independent; parts can upload in parallel.
    r = s3.upload_part(Bucket="vault", Key="backup.tar",
                       PartNumber=i, UploadId=upload_id, Body=chunk)
    parts.append({"PartNumber": i, "ETag": r["ETag"]})
    # If the process crashes HERE, the uploaded parts are orphaned:
    # invisible to LIST/GET, but still stored and still billed.

# 2. Finalize — only now does the object exist and become visible.
s3.complete_multipart_upload(
    Bucket="vault", Key="backup.tar", UploadId=upload_id,
    MultipartUpload={"Parts": parts})

# Cleanup paths if you never complete:
#   manual:    s3.abort_multipart_upload(Bucket="vault", Key="backup.tar",
#                                         UploadId=upload_id)
#   automatic: a lifecycle rule on the bucket —
#     AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }

Trade-offs

Aspect	Single PUT	Multipart upload
Resumability	None — a failed PUT restarts from zero	Retry only the failed parts
Parallelism	One stream	Parts upload concurrently
Max object size	5 GiB	Up to 5 TiB
Minimum part size	n/a	5 MiB per part (last part exempt)
Cleanup burden	Nothing to leak	Must complete or abort, else orphaned parts linger

Watch out for

Silent billing. Orphaned parts keep costing storage forever; nothing alerts you because a plain bucket size view may not surface them.
Invisible to LIST and GET. Incomplete uploads only show up via ListMultipartUploads / ListParts — never a plain object listing.
Part-size floor. Every part except the last must be at least 5 MiB, or complete is rejected; too many tiny parts also wastes request overhead.
Keep the UploadId. Lose it and you can neither complete nor cleanly abort that specific upload by hand — you fall back to listing them.
No lifecycle rule, no cleanup. Without an AbortIncompleteMultipartUpload rule, abandoned uploads never expire on their own.

Worked example

You back up a 500 MB archive as five 100 MB parts under one UploadId. Parts 1, 2, and 3 land cleanly, then the laptop sleeps and the process dies before upload_part sends part 4 and before the complete call.

Result: 300 MB now sits in the bucket as three orphaned parts. A LIST of the bucket shows the object as absent — it never existed — so the 300 MB looks like it vanished, but it still bills every day. With a lifecycle rule set to DaysAfterInitiation: 7, the upload is automatically aborted on day 7, the three parts are deleted, and the 300 MB of storage is reclaimed. Without that rule, the 300 MB would bill indefinitely.

Check yourself

After three of five parts upload and the client crashes before CompleteMultipartUpload, what does a plain LIST of the bucket show, and who pays for the landed parts?

What is the reliable way to stop paying for those orphaned parts going forward?