Upload Large Files with Python-Swiftclient

Our storage service is based on OpenStack Swift. Swift has a limit on the size of a single uploaded file. By default, this is 5GB. Nevertheless, the download size of a single file is virtually unlimited with the concept of segmentation. 

To upload files larger than 5GB, you can use segmentation. It allows you to split them into smaller segments. For example, if the size of a file is 6GB, you can split it into two segments of 3GB. 

The quickest way to try out this feature is to use the python-swiftclient. For information about installing and using the python-swiftclient tool please click here!


Please review the example below:

swift upload videos --segment-size 3221225472 --segment-container videos largeVideo.mp4 --os-username 1012345 --os-tenant-name 1012345 --os-password YOUR_FTP_PASSWORD --os-auth-url https://auth.files.nl01.cloud.servers.com:5000/v3 --auth-version 3

Use-S option to specify the segment size in bytes to use when splitting a large file (e.g. 1 Gigabyte = 1073741824 Bytes, 3 Gigabytes = 3221225472 Bytes). The above command will upload the “LARGE_FILE” in segments of 3GB. 


The quickest way to try out this feature is to use the python-swiftclient. For more information go to Upload with Python-Swiftclient.


What OpenStack Swift storage backend is doing afterward, is to:

1. Split the file in the designated segment size.

2. Upload all segments in parallel.

3. Create a manifest file that allows the file to be downloaded/accessed as one, without noticing the segmentation.


Important: The segments need to be uploaded into the same container in order for the resource to be accessible through the CDN (i.e. the values of swift upload and –segment-container need to be the same).

Python-swiftclient will manage these segment files for you, deleting old segments on deletes and overwrites, etc. You can override this behavior with the –leave-segments option if desired; this is useful if you want to have multiple versions of the same large object available.


Where


swift upload

Specifies the container where the file will be uploaded (needs to be the same as --segment-container).

--segment-size

Specifies the segment size in bytes.

--segment-container

Specifies the container where the file segments will be uploaded (needs to be the same as swift upload).

--os-username

Is the first part is your Openstack Swift username (–os-username) and the second part is your Openstack Swift tenant-name (–os-tenant-name) (e.g 1012345.1012345).

Due to the requirements of the Object Storage service (swift) architecture, your FTP username contains two authentication parts separated by dot (.) character.

In the provided example the Openstack Swift username (–os-username) is 1012345

--os-tenant-name

Is the second part of your FTP username is (–os-tenant-name).

In the provided example the Openstack Swift tenant-name (–os-tenant-name) is 1012345

--os-password

Is the FTP password, which can be obtained through the Universal CDN Control Panel.

--os-auth-url

Is the Swift authorization URL.


Below are the authorization URLs that should be used:

https://auth.files.nl01.cloud.servers.com:5000/v3 --auth-version 3 – to connect to your Europe container.

https://auth.files.us01.cloud.servers.com:5000/v3 --auth-version 3 – to connect to your North American container.


Our FTP/SFTP service automatically performs segmentation for larger files.


DLO (Dynamic Large Objects). Direct API


The segments and manifests can be uploaded directly with HTTP requests instead of having Python-swiftclient do that for you. You can just upload the segments like you would any other object and the manifest is just a zero-byte (not enforced) file with an extra X-Object-Manifest header.

All of the segments need to be in the same container, have a common object name prefix, and sort in the order in which they should be concatenated. Object names are sorted lexicographically as UTF-8 byte strings. They don’t have to be in the same container as the manifest file will be, which is useful to keep container listings clean as explained above with the Python-swiftclient.

The manifest file is simply a zero-byte (not enforced) file with the extra X-Object-Manifest: <container>/<prefix> header, where <container> is the container the object segments are in and <prefix> is the common prefix for all the segments.

It is recommended to upload all the segments first and then create or update the manifest. In this way, the full object won’t be available for downloading until the upload is complete. Additionally, a new set of segments can be uploaded to a second location and then update the manifest to point to this new location. During the upload of the new segments, the original manifest will still be available to download the first set of segments.


Please review the example below where the sample.mp4 is with the size of 96 MB (100450390 bytes):


1. Split the file. We used the split command-line tool (http://www.gnu.org/software/coreutils/split) to split the file into segments: 

split -b 30000000 sample.mp4

Since the size of the example is 100450390 bytes, the file has been split into 4 segments as follows:

xaa - 30000000 bytes
xab - 30000000 bytes
xac - 30000000 bytes
xad - 10450390 bytes


2. Upload the segments.

curl -i -T xaa -X PUT -H "X-Auth-Token: 1op7ca65a00b4bar94ff9a4cc577f67b" https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h/container_name/directory_name/xaa


Where:

xaa – is the name of the first segment.

/container_name/directory_name/xaa – is the URI path along with the name of the first segment. In the example above the segment is uploaded into a separate directory named directory_name within the primary container named container_name.

To see how to obtain the X-Auth-Token and The authorization URL (e.g. https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h) for your storage container, please go to https://help.ucdn.com/how-to-obtain-the-x-auth-token-in-identity-api-v3/

The same pattern has been used to upload the rest of the segments – xab, xac, xad.


To confirm that all four segments have been uploaded into the specified container we used the following swift command:

swift list container_name --verbose --os-username 1012345 --os-tenant-name 1011131 --os-password MHwfJMGbmyqabFau  --os-auth-url https://auth.files.nl01.cloud.servers.com:5000/v3 --auth-version 3

Output:

directory_name
directory_name/xaa
directory_name/xab
directory_name/xac
directory_name/xad


More information about the swift commands, you can find here: https://help.ucdn.com/upload-with-python-swiftclient/


3. Create the Manifest file. It should be with a zero-byte (not enforced) file with the extra X-Object-Manifest: <container>/<prefix> header, where <container> is the container the object segments are in and <prefix> is the common prefix for all the segments.

curl -i -XPUT -H "Content-Length: 0" -H "X-Auth-Token: 1op7ca65a00b4bar94ff9a4cc577f67b" -H "X-Object-Manifest: container_name/example_dirictory" https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h/container_name/sample.mp4

Please note that the path in the extra X-Object-Manifest must specify the location of the uploaded segments (-H “X-Object-Manifest: container_name/example_dirictory” ).

To confirm that the manifest file has been uploaded into the specified container we used the following swift command:

swift list container_name --verbose --os-username 1012345 --os-tenant-name 1012345 --os-password MHwfJMGbmyqabFau  --os-auth-url https://auth.files.nl01.cloud.servers.com:5000/v3 --auth-version 3

Output:

sample.mp4             
example_dirictory       
example_dirictory/xaa    
example_dirictory/xab   
example_dirictory/xac   
example_dirictory/xad

Where the sample.mp4 is the manifest file, example_dirictory is the directory where the segments have been uploaded. 

In the example above we have uploaded the segments of the sample.mp4 file into the example_dirictory but the manifest has been uploaded into the container_name. 

All the segments need to be in the same storage container, have a common object name prefix, and sort in the order in which they should be concatenated. Object names are sorted lexicographically as UTF-8 byte strings. They don’t have to be in the same container as the manifest file will be, which is useful to keep container listings clean as explained above with swift.


Static Large Object (SLO) 

This feature is very similar to Dynamic Large Object (DLO) support in that it allows the user to upload many objects concurrently and afterward download them as a single object. It is different in that it does not rely on eventually consistent container listings to do so. Instead, a user-defined manifest of the object segments is used.


Uploading the Manifest

After the user has uploaded the objects to be concatenated, a manifest is uploaded. The request must be a PUT with the query parameter:

The body of this request will be an ordered list of segment descriptions in JSON format. The data to be supplied for each segment is:

KeyDescription
paththe path to the segment object (not including account) /container/object_name
etag(optional) the ETag given back when the segment object was PUT
size_bytes(optional) the size of the complete segment object in bytes
range(optional) the (inclusive) range within the object to use as a segment. If omitted, the entire object is used.


The format of the list will be:

[{"path": "/cont/object",
  "etag": "etagoftheobjectsegment",
  "size_bytes": 10485760,
  "range": "1048576-2097151"}, ...]

Please review the example below where the sample.mp4 has been split and upload in the same directory /container_name/directory_name/


1. Upload the segments.

curl -i -T xaa -X PUT -H "X-Auth-Token: 1op7ca65a00b4bar94ff9a4cc577f67b" https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h/container_name/directory_name/xaa

The same pattern has been used to upload the rest of the segments – xab, xac, xad. 

To confirm that all four segments have been uploaded into the specified container we have used the following swift command:

swift list container_name --verbose --os-username 1012345 --os-tenant-name 1012345 --os-password MHwfJMGbmyqabFau  --os-auth-url https://auth.files.nl01.cloud.servers.com:5000/v3 --auth-version 3

Output:

directory_name
directory_name/xaa
directory_name/xab
directory_name/xac
directory_name/xad


2. Create the Manifest file (sample.mp4) with content:

[
{ 
"path": "container_name/directory_name/xaa" 
}, 
{ 
"path": "container_name/directory_name/xab" 
}, 
{ 
"path": "container_name/directory_name/xac" 
},
{ 
"path": "container_name/directory_name/xad" 
}
]

Please note that the path must specify the location of the uploaded segments.


3. Upload the SLO Manifest file.

curl -i -T manifest.mp4 -XPUT -H "X-Auth-Token: 1op7ca65a00b4bar94ff9a4cc577f67b" "https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h/container_name/sample.mp4?multipart-manifest=put&heartbeat=on"

Example output:

HTTP/1.1 100 Continue
HTTP/1.1 202 Accepted
Content-Type: text/plain
X-Trans-Id: tx0243b9c72be64698b1728-00613f60db
X-Openstack-Request-Id: tx0243b9c72be64698b1728-00613f60db
Date: Mon, 13 Sep 2021 14:31:55 GMT
Transfer-Encoding: chunked
Etag: "d5b93f74bffb6c2e001703d84fd58ffd"
Last Modified: Mon, 13 Sep 2021 14:31:56 GMT
Response Body:
Response Status: 201 Created
Errors:


To download the generated manifest file from our object storage servers, use the following command:

curl --output sample.txt -H "X-Auth-Token: 1op7ca65a00b4bar94ff9a4cc577f67b" https://storage.files.nl01.cloud.servers.com:8080/v1/SERVERSCOM_36c657231f83401ebc46e770e0cd449h/container_name/sample.mp4?multipart-manifest=get

Example output:

% Total     % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   788  100   788    0     0   1819      0 --:--:-- --:--:-- --:--:--  1815


To print the content of the generated manifest file from our object storage servers we used cat and transfer of standard output to the jq (command-line JSON processor  – https://stedolan.github.io/jq/).

cat sample.txt | jq
[
  {
    "hash": "e746e11a6eac1acd797e55dd757fcd8a",
    "last_modified": "2021-09-13T13:03:11.000000",
    "bytes": 30000000,
    "name": "/container_name/directory_name/xaa",
    "content_type": "application/octet-stream"
  },
  {
    "hash": "beb2380ea9e1289bf5caa2767dfcebb9",
    "last_modified": "2021-09-13T13:03:31.000000",
    "bytes": 30000000,
    "name": "/container_name/directory_name/xab",
    "content_type": "application/octet-stream"
  },
  {
    "hash": "f321f53cea68cf5d17913bceb444d009",
    "last_modified": "2021-09-13T13:04:22.000000",
    "bytes": 30000000,
    "name": "/container_name/directory_name/xac",
    "content_type": "application/octet-stream"
  },
  {
    "hash": "90c45fe87cb74a28ec40a704e8995fe9",
    "last_modified": "2021-09-13T13:04:40.000000",
    "bytes": 10450390,
    "name": "/container_name/directory_name/xad",
    "content_type": "application/octet-stream"
  }
]


For more information about the OpenStack Large Object Support (DLO/SLO), please go to:

https://docs.openstack.org/swift/latest/api/large_objects.html

https://docs.openstack.org/swift/latest/overview_large_objects.html



Upload with Python-Swiftclient

Upload Large Files with Python-Swiftclient

Upload Through OpenStack Swift API

Upload with Rclone