datalad

data distribution geared toward scientific datasets

These details have been verified by PyPI

Maintainers

bpoldrack jwodder kyleam Michael.Hanke yarikoptic

Project description

____ _ _ _
| _ \ __ _ | |_ __ _ | | __ _ __| |
| | | | / _` | | __| / _` | | | / _` | / _` |
| |_| | | (_| | | |_ | (_| | | |___ | (_| | | (_| |
|____/ \__,_| \__| \__,_| |_____| \__,_| \__,_|
Read me

[![Travis tests status](https://secure.travis-ci.org/datalad/datalad.png?branch=master)](https://travis-ci.org/datalad/datalad) [![codecov.io](https://codecov.io/github/datalad/datalad/coverage.svg?branch=master)](https://codecov.io/github/datalad/datalad?branch=master) [![Documentation](https://readthedocs.org/projects/datalad/badge/?version=latest)](http://datalad.rtfd.org) [![Testimonials 4](https://img.shields.io/badge/testimonials-4-brightgreen.svg)](https://github.com/datalad/datalad/wiki/Testimonials) [![https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg](https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg)](https://singularity-hub.org/collections/667)

The full documentation is available at: http://docs.datalad.org

# 10000ft overview

DataLad makes data management and data distribution more accessible.
To do that, it stands on the shoulders of [Git] and [Git-annex] to deliver a
decentralized system for data exchange. This includes automated ingestion of
data from online portals and exposing it in readily usable form as Git(-annex)
repositories, so-called datasets. The actual data storage and permission
management, however, remains with the original data providers.

# Status

DataLad is under rapid development. While the code base is still growing,
the focus is increasingly shifting towards robust and safe operation
with a sensible API. Organization and configuration are still subject of
considerable reorganization and standardization. However, DataLad is
usable today and user feedback is always welcome.

# Support

[Neurostars](https://neurostars.org) is the preferred venue for DataLad
support. Forum login is possible with your existing Google, Twitter, or GitHub
account. Before posting a [new
topic](https://neurostars.org/new-topic?tags=datalad), please check the
[previous posts](https://neurostars.org/search?q=tags%3Adatalad) tagged with
`#datalad`. To get help on a datalad-related issue, please consider to follow
this [message
template](https://neurostars.org/new-topic?body=-%20Please%20describe%20the%20problem.%0A-%20What%20steps%20will%20reproduce%20the%20problem%3F%0A-%20What%20version%20of%20DataLad%20are%20you%20using%20%28run%20%60datalad%20--version%60%29%3F%20On%20what%20operating%20system%20%28consider%20running%20%60datalad%20plugin%20wtf%60%29%3F%0A-%20Please%20provide%20any%20additional%20information%20below.%0A-%20Have%20you%20had%20any%20luck%20using%20DataLad%20before%3F%20%28Sometimes%20we%20get%20tired%20of%20reading%20bug%20reports%20all%20day%20and%20a%20lil'%20positive%20end%20note%20does%20wonders%29&tags=datalad).

# DataLad 101

A growing number of datasets is made available from http://datasets.datalad.org .
Those datasets are just regular git/git-annex repositories organized into
a hierarchy using git submodules mechanism. So you can use regular
git/git-annex commands to work with them, but might need `datalad` to be
installed to provide additional functionality (e.g., fetching from
portals requiring authentication such as CRCNS, HCP; or accessing data
originally distributed in tarballs). But datalad aims to provide higher
level interface on top of git/git-annex to simplify consumption and sharing
of new or derived datasets. To that end, you can install **all** of
those datasets using

datalad install -r ///

which will `git clone` all of those datasets under `datasets.datalad.org`
sub-directory. This command will not fetch any large data files, but will
merely recreate full hierarchy of all of those datasets locally which
also takes a good chunk of your filesystem meta-data storage. Instead of
fetching all datasets at once, you could either specify specific dataset to
be installed, e.g.

datalad install ///openfmri/ds000113

or install top level dataset by omitting `-r` option and then calling
`datalad install` for specific sub-datasets you want to have installed,
possibly with `-r` to install their sub-datasets as well, e.g.

datalad install ///
cd datasets.datalad.org
datalad install -r openfmri/ds000001 indi/fcon1000

You can navigate datasets you have installed in your terminal or browser
while fetching necessary files or installing new sub-datasets using the
`datalad get [FILE|DIR]` command. DataLad will take care about
downloading, extracting, and possibly authenticating (would ask you for
credentials) in a uniform fashion regardless of the original data location
or distribution serialization (e.g., a tarball). Since it is using git
and git-annex underneath, you can be assured that you are getting **exact**
correct version of the data.

Use-cases DataLad covers are not limited to "consumption" of data.
DataLad also aims to help publishing original or derived data, thus facilitating
more efficient data management when collaborating or simply sharing your data.
You can find more documentation at http://docs.datalad.org .

# Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) if you are interested in internals or
contributing to the project.

# Installation

## Debian-based systems

On Debian-based systems, we recommend to enable [NeuroDebian]
from which we provide recent releases of DataLad. Datalad package recommends
some relatively heavy packages (e.g. scrapy) which are useful only if you are
interested in using `crawl` functionality. If you need just the base
functionality of the datalad, install without recommended packages
(e.g., `apt-get install --no-install-recommends datalad`)

## Other Linux'es, OSX (Windows yet TODO) via pip

By default, installation via pip installs core functionality of datalad
allowing for managing datasets etc. Additional installation schemes
are available, so you could provide enhanced installation via
`pip install datalad[SCHEME]` where `SCHEME` could be

- `crawl`
to also install `scrapy` which is used in some crawling constructs
- `tests`
to also install dependencies used by unit-tests battery of the datalad
- `full`
to install all dependencies.

For installation through `pip`, you will need some external dependencies
not shipped from it (e.g. `git-annex`, etc.) for which please refer to
the next section.

## Dependencies

Our [setup.py] and accompanying packaging describe all necessary dependencies.
On Debian-based systems we recommend to enable [NeuroDebian],
since we use it to provide backports of recent fixed external modules we
depend upon, and up-to-date [Git-annex] is necessary for proper operation of
DataLad packaged (install `git-annex-standalone` from NeuroDebian repository).
Additionally, if you would like to develop and run our tests battery see
[CONTRIBUTING.md](CONTRIBUTING.md) regarding additional dependencies.

Later we will provide bundled installations of DataLad across popular
platforms.

# License

MIT/Expat

## Acknowledgements

DataLad development is supported by a US-German collaboration in
computational neuroscience (CRCNS) project "DataGit: converging catalogues,
warehouses, and deployment logistics into a federated 'data distribution'"
(Halchenko/Hanke), co-funded by the US National Science Foundation (NSF
1429999) and the German Federal Ministry of Education and Research (BMBF
01GQ1411). Additional support is provided by the German federal state of
Saxony-Anhalt and the European Regional Development
Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform

[Git]: https://git-scm.com
[Git-annex]: http://git-annex.branchable.com
[setup.py]: https://github.com/datalad/datalad/blob/master/setup.py
[NeuroDebian]: http://neuro.debian.net

Project details

These details have been verified by PyPI

Maintainers

bpoldrack jwodder kyleam Michael.Hanke yarikoptic

Release history Release notifications | RSS feed

1.0.2

Apr 19, 2024

1.0.1

Apr 17, 2024

1.0.0

Apr 6, 2024

0.19.6

Feb 2, 2024

0.19.5

Dec 28, 2023

0.19.4

Dec 13, 2023

0.19.3

Aug 10, 2023

0.19.2

Jul 3, 2023

0.19.1

Jun 26, 2023

0.19.0

Jun 14, 2023

0.18.5

Jun 13, 2023

0.18.4

May 16, 2023

0.18.3

Mar 25, 2023

0.18.2

Feb 27, 2023

0.18.1

Jan 16, 2023

0.18.0

Dec 31, 2022

0.17.10

Dec 14, 2022

0.17.9

Nov 7, 2022

0.17.8

Oct 24, 2022

0.17.7

Oct 14, 2022

0.17.6

Sep 21, 2022

0.17.5

Sep 2, 2022

0.17.4

Aug 30, 2022

0.17.3

Aug 24, 2022

0.17.2

Jul 16, 2022

0.17.1

Jul 11, 2022

0.17.0

Jul 8, 2022

0.16.7

Jul 6, 2022

0.16.6

Jun 14, 2022

0.16.5

Jun 8, 2022

0.16.4

Jun 2, 2022

0.16.3

May 12, 2022

0.16.2

Apr 21, 2022

0.16.1

Apr 8, 2022

0.15.6

Feb 27, 2022

0.15.5

Feb 9, 2022

0.15.4

Dec 16, 2021

0.15.3

Oct 30, 2021

0.15.2

Oct 6, 2021

0.15.1

Sep 24, 2021

0.15.0

Sep 14, 2021

0.14.8

Sep 13, 2021

0.14.7

Aug 3, 2021

0.14.6

Jun 27, 2021

0.14.5

Jun 21, 2021

0.14.4

May 10, 2021

0.14.3

Apr 28, 2021

0.14.2

Apr 14, 2021

0.14.1

Apr 1, 2021

0.14.0

Feb 2, 2021

0.14.0rc1 pre-release

Jan 26, 2021

0.13.7

Jan 4, 2021

0.13.6

Dec 14, 2020

0.13.5

Oct 30, 2020

0.13.4

Oct 6, 2020

0.13.3

Aug 28, 2020

0.13.2

Aug 10, 2020

0.13.1

Jul 17, 2020

0.13.0

Jun 23, 2020

0.13.0rc2 pre-release

May 22, 2020

0.13.0rc1 pre-release

May 5, 2020

0.12.7

May 22, 2020

0.12.6

Apr 23, 2020

0.12.5

Apr 2, 2020

0.12.4

Mar 19, 2020

0.12.3

Mar 16, 2020

0.12.2

Jan 28, 2020

0.12.1

Jan 15, 2020

0.12.0

Jan 11, 2020

0.12.0rc6 pre-release

Oct 20, 2019

0.12.0rc5 pre-release

Sep 4, 2019

0.12.0rc4 pre-release

May 16, 2019

0.12.0rc3 pre-release

May 7, 2019

0.12.0rc2 pre-release

Mar 18, 2019

0.12.0rc1 pre-release

Mar 4, 2019

0.11.8

Oct 14, 2019

0.11.7

Sep 9, 2019

0.11.6

Jul 31, 2019

0.11.5

May 28, 2019

0.11.4

Apr 6, 2019

0.11.3

Feb 19, 2019

0.11.2

Feb 8, 2019

0.11.1

Nov 27, 2018

0.11.0

Oct 24, 2018

0.10.3.1

Sep 13, 2018

0.10.2

Jul 10, 2018

0.10.1

Jun 17, 2018

0.10.0

Jun 10, 2018

0.10.0rc5 pre-release

Jun 5, 2018

0.10.0rc4 pre-release

May 21, 2018

0.10.0rc3 pre-release

May 11, 2018

This version

0.10.0rc2 pre-release

May 6, 2018

0.10.0rc1 pre-release

Apr 27, 2018

0.9.3

Mar 16, 2018

0.9.2

Mar 4, 2018

0.9.1

Oct 1, 2017

0.9.0

Sep 19, 2017

0.8.1

Aug 13, 2017

0.8.0

Jul 31, 2017

0.7.0

Jun 26, 2017

0.6.0

Jun 14, 2017

0.6.0.dev1 pre-release

Jun 25, 2017

0.5.1

Mar 25, 2017

0.5.0

Mar 20, 2017

0.4.1

Nov 11, 2016

0.4

Oct 22, 2016

0.3.1

Oct 1, 2016

0.3

Sep 23, 2016

0.2.3

Jun 28, 2016

0.2.2

Jun 21, 2016

0.2.1

Jun 11, 2016

0.2.1.dev1 pre-release

Jun 11, 2016

0.2

May 20, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad-0.10.0rc2.tar.gz (1.2 MB view hashes)

Uploaded May 6, 2018 Source

Built Distribution

datalad-0.10.0rc2-py2.py3-none-any.whl (1.3 MB view hashes)

Uploaded May 6, 2018 Python 2 Python 3

Hashes for datalad-0.10.0rc2.tar.gz

Hashes for datalad-0.10.0rc2.tar.gz
Algorithm	Hash digest
SHA256	`94b4da2ed5364bf61d5980d15b88bc17110b6fe91ab31db71d703048e9816c24`
MD5	`cffa71352a2af76582c7d7634fcb2d60`
BLAKE2b-256	`f8474e193700784d173879f20747acb81c3a182c5fc268598f21b416e7c792d3`

Hashes for datalad-0.10.0rc2-py2.py3-none-any.whl

Hashes for datalad-0.10.0rc2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8006b2d355bcd1c4400dd774e97f5cea3b0ba5ed98545674f92299329dece85`
MD5	`5a9a5bc7df6681fcd32885e0df31c3ab`
BLAKE2b-256	`311f975e2915430c85fce987219438dc5dfd01da7c25adae205c12e5731e08a8`