Multipart upload that never finished

A big file uploaded in pieces, then abandoned mid-flight — the pieces linger in the bucket, unseen but still billed.

The idea

To upload a large object, S3-style storage lets you split it into independent parts and send them under one UploadId. The object only becomes real when you call CompleteMultipartUpload, which stitches the parts together.

If the client crashes or quits before that final call, the parts that already landed are not assembled and not deleted — they become orphaned parts. A normal LIST or GET can't see them, yet they keep occupying storage and accruing cost until a lifecycle rule aborts the incomplete upload.

See it work

bucket (multipart upload)
Press play to upload a 500 MB object as five parts.

How it works

You open the upload once to get an UploadId, push each part (numbered, at least 5 MiB except the last), then call complete with the list of parts and their ETags. If anything dies before complete, the landed parts stay — so you either retry-and-complete, or abort.

import boto3
s3 = boto3.client("s3")

# 1. Open the upload — this returns the UploadId you must keep.
mpu = s3.create_multipart_upload(Bucket="vault", Key="backup.tar")
upload_id = mpu["UploadId"]

parts = []
for i, chunk in enumerate(read_chunks("backup.tar", size=100 * 1024 * 1024), start=1):
    # Each part is independent; parts can upload in parallel.
    r = s3.upload_part(Bucket="vault", Key="backup.tar",
                       PartNumber=i, UploadId=upload_id, Body=chunk)
    parts.append({"PartNumber": i, "ETag": r["ETag"]})
    # If the process crashes HERE, the uploaded parts are orphaned:
    # invisible to LIST/GET, but still stored and still billed.

# 2. Finalize — only now does the object exist and become visible.
s3.complete_multipart_upload(
    Bucket="vault", Key="backup.tar", UploadId=upload_id,
    MultipartUpload={"Parts": parts})

# Cleanup paths if you never complete:
#   manual:    s3.abort_multipart_upload(Bucket="vault", Key="backup.tar",
#                                         UploadId=upload_id)
#   automatic: a lifecycle rule on the bucket —
#     AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }

Trade-offs

AspectSingle PUTMultipart upload
ResumabilityNone — a failed PUT restarts from zeroRetry only the failed parts
ParallelismOne streamParts upload concurrently
Max object size5 GiBUp to 5 TiB
Minimum part sizen/a5 MiB per part (last part exempt)
Cleanup burdenNothing to leakMust complete or abort, else orphaned parts linger

Watch out for

Worked example

You back up a 500 MB archive as five 100 MB parts under one UploadId. Parts 1, 2, and 3 land cleanly, then the laptop sleeps and the process dies before upload_part sends part 4 and before the complete call.

Result: 300 MB now sits in the bucket as three orphaned parts. A LIST of the bucket shows the object as absent — it never existed — so the 300 MB looks like it vanished, but it still bills every day. With a lifecycle rule set to DaysAfterInitiation: 7, the upload is automatically aborted on day 7, the three parts are deleted, and the 300 MB of storage is reclaimed. Without that rule, the 300 MB would bill indefinitely.

Check yourself

After three of five parts upload and the client crashes before CompleteMultipartUpload, what does a plain LIST of the bucket show, and who pays for the landed parts?

What is the reliable way to stop paying for those orphaned parts going forward?