Project description

Wrangl

Parallel data preprocessing for NLP and ML. See docs here. If you find this work helpful, please consider citing

@misc{zhong2021wrangl,
  author = {Zhong, Victor},
  title = {Wrangl: Parallel data preprocessing for NLP and ML},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vzhong/wrangl}}
}

The supervised learning dataset parallelization component of this library uses Ray. The reinforcement learning environment parallelization component of this library uses Torchbeast.

Installation

pip install -e .  # add [dev] if you want to run tests and build docs.

# for latest
pip install git+https://github.com/vzhong/wrangl

# pypi release
pip install wrangl

Usage

See examples for usage. Here are some common use cases:

process data in parallel
train models

Commandline utilities

Current supports:

annotating text files
plotting learning curves
autodocumenting this package

wrangl -h

Run tests

python -m unittest discover tests

Generate docs

wrangl autodoc

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.8

May 9, 2022

This version

0.0.6

Dec 13, 2021

0.0.5

Sep 29, 2021

0.0.4

Sep 26, 2021

0.0.1

Sep 1, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrangl-0.0.6.tar.gz (17.3 kB view hashes)

Uploaded Dec 13, 2021 Source

Hashes for wrangl-0.0.6.tar.gz

Hashes for wrangl-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`bd4c02970c323774cc620ccee0618504919767015f9176ac6a12c62452d662cf`
MD5	`a89358c63868d4426ea78a658b2328e9`
BLAKE2b-256	`e59ec45ca4ec170827a97584fd95222b7746021b0bf3920aea7eb914f93af5f3`