skip to navigation
skip to content

Not Logged In

s3 0.1.3

Python module which connects to Amazon's S3 REST API

Overview

s3 is a connector to S3, Amazon’s Simple Storage System REST API.

Use it to upload, download, delete, copy, test files for existence in S3, or update their metadata.

S3 files may have metadata in addition to their content. Metadata is a set of key/value pairs. Metadata may be set when the file is uploaded or it can be updated subsequently.

Installation

From PyPi

$ pip install s3

From source

$ hg clone ssh://hg@bitbucket.org/prometheus/s3
$ pip install -e s3

The installation is successful if you can import s3. The following command must produce no errors:

$ python -c 'import s3'

API to remote storage

S3 Filenames

An S3 file name consists of a bucket and a key. This pair of strings uniquely identifies the file within S3.

The S3Name class is instantiated with a key and a bucket; the key is required and the bucket defaults to None.

The Storage class methods take a remote_name argument which can be either a string which is the key, or an instance of the S3Name class. When no bucket is given (or the bucket is None) then the default_bucket established when the connection is instantiated is used. If no bucket is given (or the bucket is None) and there is no default bucket then a ValueError is raised.

In other words, the S3Name class provides a means of using a bucket other than the default_bucket.

Headers and Metadata

Additional http headers may be sent using the methods which write data. These methods accept an optional headers argument which is a python dict. The headers control various aspects of how the file may be handled. S3 supports a variety of headers. These are not discussed here. See Amazon’s S3 documentation for more info on S3 headers.

Those headers whose key begins with the special prefix: x-amz-meta- are considered to be metadata headers and are used to set the metadata attributes of the file.

The methods which read files also return the metadata which consists of only those response headers which begin with x-amz-meta-.

Storage Methods

The arguments remote_source, remote_destination, and remote_name may be either a string, or an S3Name instance.

local_name is a string and is the name of the file on the local system. This string is passed directly to open().

headers is a python dict used to encode additional request headers.

All methods return on success or raise StorageError on failure.

storage.copy(remote_source, remote_destination, headers={})
Copy remote_source to remote_destination. The destination metadata is copied from headers when it contains metadata; otherwise it is copied from the source metadata.
storage.delete(remote_name)
Delete remote_name from storage.
exists, metadata = storage.exists(remote_name)
Test if remote_name exists in storage, retrieve its metadata if it does. exists - boolean, metadata - dict.
metadata = storage.read(remote_name, local_name)
Download remote_name from storage, save it locally as local_name and retrieve its metadata. metadata - dict.
storage.update_metadata(remote_name, headers)
Update (replace) the metadata associated with remote_name with the metadata headers in headers.
storage.write(local_name, remote_name, headers={})
Upload local_name to storage as remote_name, and set its metadata if any metadata headers are in headers.

Usage

Configuration

First configure your yaml file.

  • access_key_id and secret_access_key are generated by the S3 account manager. They are effectively the username and password for the account.
  • default_bucket is the name of the default bucket to use when referencing S3 files. bucket names must be unique (on earth) so by convention we use a prefix on all our bucket names: com.prometheus.
  • endpoint is the Amazon server url to connect to. See http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of the available endpoints.
  • tls True => use https://, False => use http://. Default is True.

Here is an example s3.yaml

---
s3:
    access_key_id: "XXXXX"
    secret_access_key: "YYYYYYY"
    default_bucket: "ZZZZZZZ"
    endpoint: "s3-us-west-2.amazonaws.com"

Next configure your S3 bucket permissions. Eventually, s3 will support bucket management. Until then use Amazon’s web interface:

  • Log onto your Amazon account.
  • Create a bucket or click on an existing bucket.
  • Click on Properties.
  • Click on Permissions.
  • Click on Edit Bucket Policy.

Here is a example policy with the required permissions:

{
        "Version": "2008-10-17",
        "Id": "Policyxxxxxxxxxxxxx",
        "Statement": [
                {
                        "Sid": "Stmtxxxxxxxxxxxxx",
                        "Effect": "Allow",
                        "Principal": {
                                "AWS": "arn:aws:iam::xxxxxxxxxxxx:user/XXXXXXX"
                        },
                        "Action": [
                                "s3:AbortMultipartUpload",
                                "s3:GetObjectAcl",
                                "s3:GetObjectVersion",
                                "s3:DeleteObject",
                                "s3:DeleteObjectVersion",
                                "s3:GetObject",
                                "s3:PutObjectAcl",
                                "s3:PutObjectVersionAcl",
                                "s3:ListMultipartUploadParts",
                                "s3:PutObject",
                                "s3:GetObjectVersionAcl"
                        ],
                        "Resource": [
                                "arn:aws:s3:::com.prometheus.cgtest-1/*",
                                "arn:aws:s3:::com.prometheus.cgtest-1"
                        ]
                }
        ]
}

Examples

Once the yaml file is configured and the bucket policy is set, you can instantiate a S3Connection and you use that connection to instantiate a Storage instance.

import s3
import yaml

with open('s3.yaml', 'r') as fi:
    config = yaml.load(fi)

connection = s3.S3Connection(**config['s3'])
storage = s3.Storage(connection)

Then you call methods on the Storage instance.

The following code uploads a file named “example” from the local filesystem as “example-in-s3” in s3. It then checks that “example-in-s3” exists in storage, downloads the file as “example-from-s3”, compares the original with the downloaded copy to ensure they are the same, deletes “example-in-s3”, and finally checks that it is no longer in storage.

import subprocess
try:
    storage.write("example", "example-in-s3")
    exists, metadata = storage.exists("example-in-s3")
    assert exists
    metadata = storage.read("example-in-s3", "example-from-s3")
    assert 0 == subprocess.call(['diff', "example", "example-from-s3"])
    storage.delete("example-in-s3")
    exists, metadata = storage.exists("example-in-s3")
    assert not exists
except StorageError, e:
    print 'failed:', e

The following code again uploads “example” as “example-in-s3”. This time it uses the bucket “my_other_bucket” explicitly, and it sets some metadata and checks that the metadata is set correctly. Then it changes the metadata and checks that as well.

headers = {
    'x-amz-meta-state': 'unprocessed',
    }
remote_name = s3.S3Name("example-in-s3", bucket="my_other_bucket")
try:
    storage.write("example", remote_name, headers=headers)
    exists, metadata = storage.exists(remote_name)
    assert exists
    assert metadata == headers
    headers['x-amz-meta-state'] = 'processed'
    storage.update_metadata(remote_name, headers)
    metadata = storage.read(remote_name, "example-from-s3")
    assert metadata == headers
except StorageError, e:
    print 'failed:', e
 
File Type Py Version Uploaded on Size
s3-0.1.3.tar.gz (md5) Source 2014-06-11 13KB
  • Downloads (All Versions):
  • 22 downloads in the last day
  • 144 downloads in the last week
  • 505 downloads in the last month