Parallel data preprocessing for NLP and ML.
Project description
Wrangl
Parallel data preprocessing for NLP and ML. See docs here. If you find this work helpful, please consider citing
@misc{zhong2021wrangl,
author = {Zhong, Victor},
title = {Wrangl: Parallel data preprocessing for NLP and ML},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/vzhong/wrangl}}
}
The supervised learning dataset parallelization component of this library uses Ray. The reinforcement learning environment parallelization component of this library uses Torchbeast.
Installation
pip install -e . # add [dev] if you want to run tests and build docs.
# for latest
pip install git+https://github.com/vzhong/wrangl
# pypi release
pip install wrangl
Usage
See examples for usage. Here are some common use cases:
- process data in parallel
- train models
Additional utilities
Annotate data in commandline:
wannotate -h
Run tests
python -m unittest discover tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wrangl-0.0.5.tar.gz
(25.1 kB
view hashes)
Built Distribution
wrangl-0.0.5-py3-none-any.whl
(30.5 kB
view hashes)