S3 Copy Bucket

Motivation

A bucket is filled with a larger amount of file objects (S3 objects w/o tailing /) and directory objects (S3 objects with zero size and tailing /).

After copying this bucket to another bucket with existing tools e.g. rclone copy, the number of objects in both buckets differs. The source bucket contains more objects than the destination bucket.

This is reproducible with aws s3 copy, rclone copy, s5cmd cp and mc cp (MinIO client).

These tested tools have a built-in logic to drop directory objects. It looks like this behavior is intentional - it's not a bug, it's a feature.

The example "Incomplete copy" below demonstrates this behaviour.

Goal

Copy all objects fast and reliable from one bucket to another bucket.

Solution

The Python script s3copybucket.py copies all objects in a bucket to another bucket.

Example: Incomplete copy

# aws s3api list-objects-v2 --bucket bucket_src | egrep 'Key|Size'
            "Key": "dir_lvl0_a/",
            "Size": 0,
            "Key": "dir_lvl0_b/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_a/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_b/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_b/file_lvl2",
            "Size": 7341822,
            "Key": "dir_lvl0_b/file_lvl1",
            "Size": 7341822,
            
# rclone copy s3:bucket_src s3:bucket_dst -v
2022/04/09 19:43:55 INFO  : dir_lvl0_b/file_lvl1: Copied (server-side copy)
2022/04/09 19:43:56 INFO  : dir_lvl0_b/dir_lvl1_b/file_lvl2: Copied (server-side copy)
2022/04/09 19:43:56 INFO  :
Transferred:       12.672 MiB / 12.672 MiB, 100%, 0 B/s, ETA -
Transferred:            2 / 2, 100%
Elapsed time:         0.6s

# aws s3api list-objects-v2 --bucket bucket_dst | egrep 'Key|Size'
            "Key": "dir_lvl0_b/dir_lvl1_b/file_lvl2",
            "Size": 7341822,
            "Key": "dir_lvl0_b/file_lvl1",
            "Size": 7341822,

Example: Full copy

# ./s3copybucket.py bucket_src bucket_dst #
Query objects information from s3://bucket_src
Copy 6 objects from s3://bucket_src to s3://bucket_dst
Press Enter to continue or CTRL-C to abort ...
Copied object: dir_lvl0_b/
Copied object: dir_lvl0_b/dir_lvl1_a/
Copied object: dir_lvl0_a/
Copied object: dir_lvl0_b/file_lvl1
Copied object: dir_lvl0_b/dir_lvl1_b/file_lvl2
Copied object: dir_lvl0_b/dir_lvl1_b/

# aws s3api list-objects-v2 --bucket bucket_dst | egrep 'Key|Size'
            "Key": "dir_lvl0_a/",
            "Size": 0,
            "Key": "dir_lvl0_b/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_a/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_b/",
            "Size": 0,
            "Key": "dir_lvl0_b/dir_lvl1_b/file_lvl2",
            "Size": 7341822,
            "Key": "dir_lvl0_b/file_lvl1",
            "Size": 7341822,

Limitations

  • no error handling

Requirements

Changelog

May 2022

  • Add support for Python2

April 2022

  • Initial version

Project page and feedback

The project page is at https://www.carstengrohmann.de/s3copybucket.html.

The source code is at https://sr.ht/~carstengrohmann/S3CopyBucket.

Comments, suggestions and patches are welcome and appreciated. Please email me.

License

This software is covered by the MIT License.

Copyright (c) 2022 Carsten Grohmann mail@carstengrohmann.de

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.