A big file uploaded in pieces, then abandoned mid-flight — the pieces linger in the bucket, unseen but still billed.
To upload a large object, S3-style storage lets you split it into independent parts and send them under one UploadId. The object only becomes real when you call CompleteMultipartUpload, which stitches the parts together.
If the client crashes or quits before that final call, the parts that already landed are not assembled and not deleted — they become orphaned parts. A normal LIST or GET can't see them, yet they keep occupying storage and accruing cost until a lifecycle rule aborts the incomplete upload.
You open the upload once to get an UploadId, push each part (numbered, at least 5 MiB except the last), then call complete with the list of parts and their ETags. If anything dies before complete, the landed parts stay — so you either retry-and-complete, or abort.
import boto3
s3 = boto3.client("s3")
# 1. Open the upload — this returns the UploadId you must keep.
mpu = s3.create_multipart_upload(Bucket="vault", Key="backup.tar")
upload_id = mpu["UploadId"]
parts = []
for i, chunk in enumerate(read_chunks("backup.tar", size=100 * 1024 * 1024), start=1):
# Each part is independent; parts can upload in parallel.
r = s3.upload_part(Bucket="vault", Key="backup.tar",
PartNumber=i, UploadId=upload_id, Body=chunk)
parts.append({"PartNumber": i, "ETag": r["ETag"]})
# If the process crashes HERE, the uploaded parts are orphaned:
# invisible to LIST/GET, but still stored and still billed.
# 2. Finalize — only now does the object exist and become visible.
s3.complete_multipart_upload(
Bucket="vault", Key="backup.tar", UploadId=upload_id,
MultipartUpload={"Parts": parts})
# Cleanup paths if you never complete:
# manual: s3.abort_multipart_upload(Bucket="vault", Key="backup.tar",
# UploadId=upload_id)
# automatic: a lifecycle rule on the bucket —
# AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }
| Aspect | Single PUT | Multipart upload |
|---|---|---|
| Resumability | None — a failed PUT restarts from zero | Retry only the failed parts |
| Parallelism | One stream | Parts upload concurrently |
| Max object size | 5 GiB | Up to 5 TiB |
| Minimum part size | n/a | 5 MiB per part (last part exempt) |
| Cleanup burden | Nothing to leak | Must complete or abort, else orphaned parts linger |
ListMultipartUploads / ListParts — never a plain object listing.AbortIncompleteMultipartUpload rule, abandoned uploads never expire on their own.You back up a 500 MB archive as five 100 MB parts under one UploadId. Parts 1, 2, and 3 land cleanly, then the laptop sleeps and the process dies before upload_part sends part 4 and before the complete call.
Result: 300 MB now sits in the bucket as three orphaned parts. A LIST of the bucket shows the object as absent — it never existed — so the 300 MB looks like it vanished, but it still bills every day. With a lifecycle rule set to DaysAfterInitiation: 7, the upload is automatically aborted on day 7, the three parts are deleted, and the 300 MB of storage is reclaimed. Without that rule, the 300 MB would bill indefinitely.
After three of five parts upload and the client crashes before CompleteMultipartUpload, what does a plain LIST of the bucket show, and who pays for the landed parts?
What is the reliable way to stop paying for those orphaned parts going forward?