pytorch-datastream

Simple dataset to dataloader library for pytorch

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://badge.fury.io/py/pytorch-datastream.svg

https://img.shields.io/pypi/pyversions/pytorch-datastream.svg

https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest

https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

poetry add pytorch-datastream

Or, for the old-timers:

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.10

Nov 28, 2023

0.4.9

May 3, 2023

0.4.8

Oct 25, 2022

0.4.7 yanked

Oct 25, 2022

0.4.6

Nov 24, 2021

0.4.5

May 5, 2021

0.4.4

May 3, 2021

0.4.3

Mar 25, 2021

0.4.2

Feb 23, 2021

0.4.1

Dec 24, 2020

0.4.0

Dec 14, 2020

0.3.10

Dec 1, 2020

0.3.9

Nov 11, 2020

0.3.8

Oct 30, 2020

0.3.7

Oct 27, 2020

0.3.6

Oct 22, 2020

0.3.5

Oct 7, 2020

0.3.3

Oct 7, 2020

0.3.2

Sep 18, 2020

0.3.1

Aug 29, 2020

0.3.0

Jul 9, 2020

0.2.8

Jul 6, 2020

0.2.7

Jun 30, 2020

0.2.6

Jun 27, 2020

0.2.5

Jun 27, 2020

0.2.4

Jun 15, 2020

0.2.3

Jun 15, 2020

0.2.2

Jun 15, 2020

0.2.1

Jun 13, 2020

0.2.0

Jun 13, 2020

0.1.0

Jun 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_datastream-0.4.10.tar.gz (23.5 kB view hashes)

Uploaded Nov 28, 2023 Source

Built Distribution

pytorch_datastream-0.4.10-py3-none-any.whl (28.9 kB view hashes)

Uploaded Nov 28, 2023 Python 3

Hashes for pytorch_datastream-0.4.10.tar.gz

Hashes for pytorch_datastream-0.4.10.tar.gz
Algorithm	Hash digest
SHA256	`2943cf82091090d1d459cf4e7f3ac7a242f038efb95709dc0b4e08f5c7896bfd`
MD5	`373abf64ad2c0e53b7f31fd8531a49d5`
BLAKE2b-256	`edd37f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b`

Hashes for pytorch_datastream-0.4.10-py3-none-any.whl

Hashes for pytorch_datastream-0.4.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89afca7f52fe24351caf69bcba44c2e5c2a1bfba05e0421da27d80006ef4283a`
MD5	`e67bf36fbe4f82e3e9046fdf97fa6429`
BLAKE2b-256	`dfd55067dd96e96aea49737f7b7cd411645d76bb2327fb495ffb89c82de431fe`