datafs

DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using a json-like storage system like AWS's DynamoDB and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 2
- Python :: 2.7

Project description

DataFS Distributed Data Management System

https://img.shields.io/pypi/v/datafs.svg

https://travis-ci.org/ClimateImpactLab/DataFS.svg?branch=master

https://coveralls.io/repos/github/ClimateImpactLab/DataFS/badge.svg?branch=master

DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using document-based storage systems (for now it supports DynamoDB and MongoDB) and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.

Free software: MIT license
Documentation: https://datafs.readthedocs.io.

Features

Explicit version and metadata management for teams
Unified read/write interface across file systems
Easily create out-of-the-box configuration files for users

Usage

DataFS is built on the concept of “archives,” which are like files but with some additional features. Archives can track versions explicitly, can live on remote servers, and can be cached locally.

To interact with DataFS, you need to create an API object. This can be done in a number of ways, both within python and using spec files to allow users to use archives out of the box. See specifying DataAPI objects for more detail.

We’ll assume we already have an API object created. Once you have this, you can start using DataFS to create and use archives:

>>> my_archive = api.create_archive('my_archive', description = 'test data')
>>> my_archive.metadata
{'description': 'test data'}

Archives can be read from and written to much like a normal file:

>>> with my_archive.open('w+') as f:
...     f.write(u'test archive contents')
...
>>> with my_archive.open('r') as f:
...     print(f.read())
...
test archive contents
>>>
>>> with my_archive.open('w+') as f:
...     f.write(u'new archive contents')
...
>>> with my_archive.open('r') as f:
...     print(f.read())
...
new archive contents

By default, archives track versions explicitly. This can be turned off (such that old versions can be overwritten) using the flag versioned=False in create_archive. Version patch is bumped by default, but this can be overridden with the bumpversion argument on any write operations:

>>> my_archive.get_versions()
['0.0.1', '0.0.2']
>>>
>>> with my_archive.open('w+', bumpversion='major') as f:
...     f.write(u'a major improvement')
...
>>> my_archive.get_versions()
['0.0.1', '0.0.2', '1.0']

We can also retrieve versioned data specifically:

>>> with my_archive.open('r', version='0.0.2') as f:
...     print(f.read())
...
new archive contents
>>>
>>> with my_archive.open('r', version='1.0') as f:
...     print(f.read())
...
a major improvement
>>>

See examples for more extensive use cases.

Todo

See issues to see and add to our todos.

Credits

This package was created by Justin Simcock and Michael Delgado of the Climate Impact Lab. Check us out on github.

Thanks also to audreyr for the wonderful cookiecutter package, and to pyup, a constant source of inspiration and our third contributor.

History

0.1.0 (2016-11-18)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 2
- Python :: 2.7

Release history Release notifications | RSS feed

0.7.1

May 2, 2017

0.7.0

Mar 9, 2017

0.6.9

Feb 21, 2017

0.6.8

Feb 7, 2017

0.6.7

Feb 1, 2017

0.6.6

Jan 20, 2017

0.6.5

Jan 13, 2017

0.6.4

Jan 12, 2017

0.6.2

Jan 9, 2017

This version

0.6.1

Jan 6, 2017

0.5.0

Dec 21, 2016

0.4.1

Dec 20, 2016

0.4.0

Dec 15, 2016

0.3.0

Dec 14, 2016

0.1.3

Dec 8, 2016

0.1.2

Dec 8, 2016

0.1.1

Dec 8, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafs-0.6.1.tar.gz (49.3 kB view hashes)

Uploaded Jan 6, 2017 Source

Hashes for datafs-0.6.1.tar.gz

Hashes for datafs-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`735e7547123442511279484c65f8c3e3e7f38304c892cf2bdc9d2ed94f984722`
MD5	`9a56e9c8525365e106996aabcf39a663`
BLAKE2b-256	`91cd78ff7a3d52e2227dc482a320363fad455f8f28ba1be648491ccdd053324a`