NOTEThe term "S3" is often overloaded and can quickly lead to confusion if parties in any given dialogue make assumptions without verifying. S3 is both a service by Amazon Web Services (AWS) and a protocol.

More accurately described as "AWS S3", the service provides object storage cloud capacity off-premises for a fee and can be thought of as "the cloud" proper. It happens that AWS S3 uses a specific protocol to communicate with its object storage service known as the "S3 protocol," also commonly referred to as simply "S3." 

We recommend checking with your communication participants regularly to ensure these distinctions are understood. 

As the S3 protocol is the de facto standard for communication with both AWS S3 and various on-premises object storage solutions, it is the protocol we aim to serve. To learn more about object based storage please visit: Wikipedia

The following is not intended to be a full tutorial. These are basic commands to get started. Please refer to the s3api documentation for more info on AWS CLI functional uses.

On-premises Object Storage

Today, LC provides object storage compatible with the S3 protocol backed by NetApp's StorageGRID product. LC users may think of this particular HPC storage offering as basic individual user use cases (read: not group data share repositories) of 4TB or less allotments in an HPC-adjacent instance. 

Coming SoonIn the near future, LC intends to greatly expand its offerings of both large-capacity high performance storage and long-term streaming media archival storage object storage (using the S3 protocol) alongside our current StorageGRID solution. 

Provisioning

First you'll need to provision some on-premises object storage capacity if you haven't done so already. This is a one-time step. LC users can self-provision by navigating to LaunchIT, a web "store front" for HPC-adjacent databases, message brokers, and object storage within LC. In the CZ and the SCF, the URL is launchit.llnl.gov. For the RZ, the URL is rzlaunchit.llnl.gov. Once logged in, look for the "Object Storage Allocations" in the menu at the top.

Basic SetupπŸ”—

After allocating S3 object storage via LaunchIT, you will need to configure a profile using the AWS command line interface (CLI). The CLI enabled below works against both LC on-premises object store(s) and AWS S3 when accessible depending on the network enclave. 

In LC version 2 of the AWS CLI installed at /usr/global/awscli/aws. You can either run the aws command with the full path or add this directory to your $PATH variable, for example:

export PATH=$PATH:/usr/global/awscli

You can use the following syntax to use the AWS CLI to begin creating a profile called lc-objstore:

aws configure --profile lc-objstore

After running this command, you'll be prompted to provide the access-key and secret-key provided to you with your object storage account. Below is an example of the prompts you'll see. The values my-access-key and my-secret-key should be replaced with your information, and you can simply hit Enter to pass the prompts for "Default region name" and "Default output format":

AWS Access Key ID [None]: my-access-key

AWS Secret Access Key [None]: my-secret-key

Default region name [None]: [enter]

Default output format [None]: [enter]

After completing the above, update your profile to use a locally customized certificate bundle by running

aws configure --profile lc-objstore set ca_bundle /etc/pki/tls/cert.pem

Note that you should replace lc-objstore with the name of the profile you've created!

Below are examples both at the shell and using the boto3 python library.

S3 at the shellπŸ”—

The S3 provisioned for you should come with an existing bucket. In the sections below, you'll see how to list existing buckets, modify buckets (uploading and deleting objects from buckets), query buckets (listing and downloading objects from a bucket), and create new buckets.

List a bucketπŸ”—

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov list-buckets

Note: replace lc-objstore with the name of the profile you've created!

After this, you'll see output like

{
    "Owner": {
        "DisplayName": "<username>",
        "ID": "<Account ID>"
    },
    "Buckets": [
        {
            "CreationDate": "2022-01-22T00:29:03.070Z",
            "Name": "cz-my-bucket"
        }
    ]
}

Note the name of your bucket from the output you see. Here the bucket name is cz-my-bucket.

Upload an objectπŸ”—

In the following command, you upload an object to be named my-hosts (from the --key flag) to an existing bucket cz-my-bucket under the profile lc-objstore. The content of my-hosts is /etc/hosts specified by the --body flag and info on the object is specified by --metadata:

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov put-object --key my-hosts --body /etc/hosts --metadata '{"hosts":"local-cluster"}' --bucket cz-my-bucket

Note: if you try this, replace flag values with your info. For example, replace lc-objstore with the name of the profile you've created and replace cz-my-bucket with the name of your bucket!

This returns something like

 '{"hosts":"local-cluster"}'
{
    "ETag": "\"8b14f67903beb8db9d332a552aa60c3f\""
}

List objects in a bucketπŸ”—

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov list-objects --bucket cz-my-bucket

Note: replace lc-objstore and cz-my-bucket with your information!

Output here will look like

{
    "Contents": [
        {
            "LastModified": "2022-01-22T00:34:15.065Z",
            "ETag": "\"8b14f67903beb8db9d332a552aa60c3f\"",
            "StorageClass": "STANDARD",
            "Key": "my-hosts",
            "Owner": {
                "DisplayName": "<username>",
                "ID": "<Account ID>"
            },
            "Size": 33340
        }
    ]
}

Download an objectπŸ”—

You can download an object via

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov get-object --key my-hosts downloaded-file --bucket cz-my-bucket

Note: replace lc-objstore and cz-my-bucket with your information!

which returns a message like

{
    "AcceptRanges": "bytes",
    "ContentType": "binary/octet-stream",
    "LastModified": "Sat, 22 Jan 2022 00:34:15 GMT",
    "ContentLength": 33340,
    "ETag": "\"8b14f67903beb8db9d332a552aa60c3f\"",
    "Metadata": {
        "hosts": "local-cluster"
    }
}

To check the downloaded file,

$ tail -1 downloaded-file
172.19.3.171  oslic21-san0 oslic21.llnl.gov oslic21

Delete an objectπŸ”—

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov delete-object --key my-hosts --bucket cz-my-bucket 

Note: replace lc-objstore and cz-my-bucket with your information!

Create a bucketπŸ”—

To create additional buckets,

aws s3api --profile lc-objstore --endpoint-url https://osgha.llnl.gov create-bucket --bucket cz-my-bucket

Note: replace lc-objstore and cz-my-bucket with your information!

After running this, you should see something like

{
    "Location": "/cz-my-bucket"
}

printed to stdout.

Note that if you choose a bucket name that has already been used by another user, including cz-my-bucket as above, you'll see

An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.

So, you'll have to get more creative!

S3 with python boto3πŸ”—

https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

Create a virtual environment and install boto3πŸ”—

python3 -m venv boto-venv
source boto-venv/bin/activate
pip install -U pip
pip install boto3

Start python, import boto3 and set up the connection details

$ python
>>> import boto3
>>> import boto3.session
>>> session = boto3.session.Session(profile_name='lc-objstore')
>>> endpoint = 'https://osgha.llnl.gov'
>>> s3 = session.resource(service_name='s3', endpoint_url=endpoint)
>>> client = s3.meta.client

Remember to replace lc-objstore with your profile name!

List buckets

>>> for bucket in s3.buckets.all():
...   print(bucket.name)
...
mybucket

Here mybucket is the name of the bucket returned.

List files in a bucket

>>> for o in s3.Bucket('mybucket').objects.all():
...   print("Key: " + o.key)
...   print("Size: " + str(o.size))
...
...
Key: my-hosts
Size: 40246

Remember to replace mybucket with the name of your bucket!