Parallel data preprocessing for NLP and ML.
Project description
Wrangl
Parallel data preprocessing for NLP and ML. See docs here. If you find this work helpful, please consider citing
@misc{zhong2021wrangl,
author = {Zhong, Victor},
title = {Wrangl: Parallel data preprocessing for NLP and ML},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/vzhong/wrangl}}
}
The supervised learning dataset parallelization component of this library uses Ray. The reinforcement learning environment parallelization component of this library uses Torchbeast.
Installation
pip install -e . # add [dev] if you want to run tests and build docs.
# for latest
pip install git+https://github.com/vzhong/wrangl
# pypi release
pip install wrangl
Usage
See examples for usage. Here are some common use cases:
- process data in parallel
- train models
Commandline utilities
Current supports:
- annotating text files
- plotting learning curves
- autodocumenting this package
wrangl -h
Run tests
python -m unittest discover tests
Generate docs
wrangl autodoc
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wrangl-0.0.6.tar.gz
(17.3 kB
view hashes)