Simple dataset to dataloader library for pytorch
Project description
This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.
Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.
Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.
Install
pip install pytorch-datastream
Usage
The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.
Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
.map
.subset
.split
.cache
.with_columns
Datastream.merge
Datastream.zip
Datastream
.map
.data_loader
.zip_index
.update_weights_
.update_example_weight_
.weight
.state_dict
.load_state_dict
Merge / stratify / oversample datastreams
The fruit datastreams given below repeatedly yields the string of its fruit type.
>>> datastream = Datastream.merge([
... (apple_datastream, 2),
... (pear_datastream, 1),
... (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
Zip independently sampled datastreams
The fruit datastreams given below repeatedly yields the string of its fruit type.
>>> datastream = Datastream.zip([
... apple_datastream,
... Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
More usage examples
See the documentation for more usage examples.
Install from source
To patch the code locally for Python 3.6 run patch-python3.6.sh.
$ ./patch-python3.6.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for pytorch_datastream-0.4.6-py39-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62af2ab2d8781b3175d16f7c2b08abf6d81f7c6e2ddd6da532e973d58077f1e0 |
|
MD5 | 345950c3e2888080aab810a0336d140f |
|
BLAKE2b-256 | a3501b4c418479348aaf4fc2fffd5e003d062f19de8a4d919b69703c1a5e74d2 |
Hashes for pytorch_datastream-0.4.6-py38-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a3fe464cba45ed235b9b17584f34b14b1dd83bffe564c28bce5baa1d1c36cc9 |
|
MD5 | 90e713c7e003bcb6ab1a5b64814fd839 |
|
BLAKE2b-256 | 903c3ddb9d4147a354cea20442d30bbb187c7e740fd48d59aa9989ae135d21f0 |
Hashes for pytorch_datastream-0.4.6-py37-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b09d8d1461cfec0f152125d8d88bf931c91cc234479d3ee16e51f4def6d15bf7 |
|
MD5 | a0c53525ac029dacab1199d9d537a9aa |
|
BLAKE2b-256 | 7e03f561742603b346bf542f81d52bdaadf14f94acb2560e3bfa4d32d1b7aca8 |
Hashes for pytorch_datastream-0.4.6-py36-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e8a1f26d78961b06f701e554b240aa08c6fed94b2869e7507bd315a20ed8ad1 |
|
MD5 | ca6350acb9bfbf962ffc71bc25b937c6 |
|
BLAKE2b-256 | 747ece437e8a3adc41962ef2582776287bb3c4893e93ccdc1443a0f996f7a615 |