Skip to main content

DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using a json-like storage system like AWS's DynamoDB and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.

Project description

DataFS Distributed Data Management System

https://img.shields.io/pypi/v/datafs.svg https://travis-ci.org/ClimateImpactLab/DataFS.svg?branch=master https://coveralls.io/repos/github/ClimateImpactLab/DataFS/badge.svg?branch=master Documentation Status Updates

DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using document-based storage systems (for now it supports DynamoDB and MongoDB) and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.

Features

  • Explicit version and metadata management for teams

  • Unified read/write interface across file systems

  • Easily create out-of-the-box configuration files for users

Usage

DataFS is built on the concept of “archives,” which are like files but with some additional features. Archives can track versions explicitly, can live on remote servers, and can be cached locally.

To interact with DataFS, you need to create an API object. This can be done in a number of ways, both within python and using spec files to allow users to use archives out of the box. See specifying DataAPI objects for more detail.

We’ll assume we already have an API object created. Once you have this, you can start using DataFS to create and use archives:

>>> my_archive = api.create_archive('my_archive', description = 'test data')
>>> my_archive.metadata
{'description': 'test data'}

Archives can be read from and written to much like a normal file:

>>> with my_archive.open('w+') as f:
...     f.write(u'test archive contents')
...
>>> with my_archive.open('r') as f:
...     print(f.read())
...
test archive contents
>>>
>>> with my_archive.open('w+') as f:
...     f.write(u'new archive contents')
...
>>> with my_archive.open('r') as f:
...     print(f.read())
...
new archive contents

By default, archives track versions explicitly. This can be turned off (such that old versions can be overwritten) using the flag versioned=False in create_archive. Version patch is bumped by default, but this can be overridden with the bumpversion argument on any write operations:

>>> my_archive.get_versions()
['0.0.1', '0.0.2']
>>>
>>> with my_archive.open('w+', bumpversion='major') as f:
...     f.write(u'a major improvement')
...
>>> my_archive.get_versions()
['0.0.1', '0.0.2', '1.0']

We can also retrieve versioned data specifically:

>>> with my_archive.open('r', version='0.0.2') as f:
...     print(f.read())
...
new archive contents
>>>
>>> with my_archive.open('r', version='1.0') as f:
...     print(f.read())
...
a major improvement
>>>

See examples for more extensive use cases.

Todo

See issues to see and add to our todos.

Credits

This package was created by Justin Simcock and Michael Delgado of the Climate Impact Lab. Check us out on github.

Thanks also to audreyr for the wonderful cookiecutter package, and to pyup, a constant source of inspiration and our third contributor.

History

0.1.0 (2016-11-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafs-0.6.1.tar.gz (49.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page