Problem description: Incomplete multipart uploads
Amazon S3 is a great storage to hold your data – it’s simple, performant and cheap enough. It also has several storage options for various use cases. So, if you are using AWS, then, most probably, you are using S3 as well.
One of the most often used cases of Amazon S3 inside the R&D is to store some development artifacts there – source code packages, installation binaries and even VM images. Usually they are being uploaded to S3 as a result of CI/CD jobs. Although there should be artifacts retention policy defined to avoid storage of non-actual artifacts, there is also one extremely simple type of optimization which should be applied to your Amazon S3 buckets almost immediately after their creation – retention for incomplete multipart upload (MPU) objects.
Incomplete MPU objects may occur if your CI/CD job part which uploads artifacts to S3 bucket fails for some reason, such as network error. In this case, part of the data which has already been uploaded to S3 will still be stored there, even if that specific upload session is not resumed. You may say that you don’t have much upload failures to Amazon S3, but the experience shows that in a bucket which is used for storing artifacts and exists over one year, up to 20 percent of total storage may be consumed by incomplete MPU objects.
How to check incomplete MPU?
Amazon has recently released Amazon S3 Storage Lens – https://docs.aws.amazon.com/AmazonS3/latest/dev/storage_lens.html
Storage Lens has Free and Advanced tiers. For this case, Free tier is enough.
Navigate to Storage Lens – https://s3.console.aws.amazon.com/s3/lens
You will see a list of dashboards available, with at least one basic dashboard created by AWS for you:
Set a filter for a specific Region or Bucket if you need that:
And check the following trend to see whether you have to worry about incomplete uploads storage:
If you see that incomplete multipart uploads bytes is a significant part of the total storage (over 3%), you should go and optimize it.
How to optimize incomplete MPU storage?
The best way to ensure that a specific bucket won’t have incomplete MPU storage issue is to set Lifecycle Policy for it – https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
S3 Lifecycle management is a very powerful though complicated process, but creation of MPU policy is quite short and simple.
You can use AWS S3 Console to do that (check Management tab on specific S3 bucket page), as well as using AWS Command Line Interface (AWS CLI).
To set a MPU retentions, do the following:
- Ensure that you have AWS CLI installed and configured.
- Prepare the following mpu-retention.json file:
"ID": "MPU Retention",
It describes the policy to abort all incomplete upload sessions which were not active within the last 7 days.
3. Run the following command to apply the MPU retention policy to your-bucket:
aws s3api put-bucket-lifecycle-configuration --bucket your-bucket --lifecycle-configuration file://mpu-retention.json
Now you have set the MPU retention rule for your bucket and may be sure that you won’t see the issue again. Please note that Lifecycle Policies are applied by AWS once per day, so the new policies will take effect within the next 24 hours.