Skip to main content

Add S3 support to dtool

Project description

PyPi package

Features

  • Copy datasets to and from S3 object storage

  • List all the datasets in a S3 bucket

  • Create datasets directly in S3

Installation

To install the dtool-S3 package:

pip install dtool-s3

Configuration

Install the aws client, for details see https://docs.aws.amazon.com/cli/latest/userguide/installing.html. In short:

pip install awscli --upgrade --user

Configure the credentials using:

aws configure

These are needed for the boto3 library, for more details see https://boto3.readthedocs.io/en/latest/guide/quickstart.html.

Configuring custom endpoints

It is possible to configure buckets to make use of custom endpoints. This is useful if one wants to make use of S3 storage not hosted in AWS.

Create the file .config/dtool/dtool.json and add the s3 storage account details using the format below:

{
   "DTOOL_S3_ENDPOINT_<BUCKET NAME>": "<ENDPOINT URL HERE>",
   "DTOOL_S3_ACCESS_KEY_<BUCKET NAME>": "<USER NAME HERE>",
   "DTOOL_S3_SECRET_ACCESS_KEY_<BUCKET NAME>": "<KEY HERE>"
}

For example:

{
   "DTOOL_S3_ENDPOINT_my-bucket": "http://blueberry.famous.uni.ac.uk",
   "DTOOL_S3_ACCESS_KEY_ID_my-bucket": "olssont",
   "DTOOL_S3_SECRET_ACCESS_KEY_my-bucket": "some-secret-token"
}

The configuration can also be done using your environment variables. For example on Linux/Mac:

export DTOOL_S3_ENDPOINT_my-bucket=http://blueberry.famous.uni.ac.uk
export DTOOL_S3_ACCESS_KEY_ID_my-bucket=olssont
export DTOOL_S3_SECRET_ACCESS_KEY_my-bucket=some-secret-token

Usage

To copy a dataset from local disk (my-dataset) to a S3 bucket (/data_raw) one can use the command below:

dtool copy ./my-dataset s3://data_raw

To list all the datasets in a S3 bucket one can use the command below:

dtool ls s3://data_raw

See the dtool documentation for more detail.

Path prefix and access control

The S3 plugin supports a configurable prefix to the path. This can be used for access control to the dataset. For example:

export DTOOL_S3_DATASET_PREFIX="u/olssont"

Alternatively one can edit the ~/.config/dtool/dtool.json file:

{
   ...,
   "DTOOL_S3_DATASET_PREFIX": "u/olssont"
}

Use the following S3 access to policy to that allows reading all data in the bucket but only writing to the prefix u/<username> and dtool-:

{
  "Statement": [
    {
      "Sid": "AllowReadonlyAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:ListBucketVersions",
        "s3:GetObject",
        "s3:GetObjectTagging",
        "s3:GetObjectVersion",
        "s3:GetObjectVersionTagging"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    },
    {
      "Sid": "AllowPartialWriteAccess",
      "Effect": "Allow",
      "Action": [
        "s3:DeleteObject",
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket/dtool-*",
        "arn:aws:s3:::my-bucket/u/${aws:username}/*"
      ]
    },
    {
      "Sid": "AllowListAllBuckets",
      "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::*"
    }
  ]
}

The user also needs write access to toplevel objects that start with dtool-. Those are the registration keys that are not stored under the configured prefix. The registration keys contain the prefix where the respective dataset is found. They are empty if no prefix is configured.

Testing

Linux/Mac

All tests need the S3_TEST_BASE_URI environment variable set.

export S3_TEST_BASE_URI="s3://your-dtool-s3-test-bucket"

For the tests/test_custom_endpoint_config.py test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.

export S3_TEST_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY
export S3_TEST_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY

To run the tests.

python setup.py develop
pytest

Windows PowerShell

All tests need the S3_TEST_BASE_URI environment variable set.

$env:S3_TEST_BASE_URI = "s3://your-dtool-s3-test-bucket"

For the tests/test_custom_endpoint_config.py test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.

$env:S3_TEST_ACCESS_KEY_ID = YOUR_AWS_ACCESS_KEY
$env:S3_TEST_SECRET_ACCESS_KEY = YOUR_AWS_SECRET_ACCESS_KEY

To run the tests.

python setup.py develop
pytest

Windows DOS

All tests need the S3_TEST_BASE_URI environment variable set.

setx S3_TEST_BASE_URI "s3://test-dtool-s3-bucket-to"
python setup.py develop
pytest

For the tests/test_custom_endpoint_config.py test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.

setx S3_TEST_ACCESS_KEY_ID YOUR_AWS_ACCESS_KEY
setx S3_TEST_SECRET_ACCESS_KEY YOUR_AWS_SECRET_ACCESS_KEY

To run the tests.

python setup.py develop
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtool-s3-0.12.0.tar.gz (12.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page