transfer-nlp

NLP library designed for flexible research and development

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

# transfer-nlp

This library is a playground NLP library, built on top of Pytorch. The goal is to gradually build a design that enable researchers and engineers to quickly implement new ideas, train NLP models and serve them in production.

You can have an overview of the high-level API on this [Colab Notebook](https://colab.research.google.com/drive/1DtC31eUejz1T0DsaEfHq_DOxEfanmrG1#scrollTo=Xzu3HPdGrnza), which shows how to use the framework on several examples. All examples on these notebooks embed in-cell Tensorboard training monitoring!

The ideal use of this library is to provide a minimal implementation of a dataset loader, a vectorizer and a model. Then, given a config file with the experiment parameters, runner.py takes care of the training pipeline.

Before starting using this repository:

create a virtual environment: mkvirtualenv YourEnvName
clone the repository: git clone https://github.com/feedly/transfer-nlp.git
Install requirements: pip install -r requirements.txt

Structure of the library:

loaders - transfer-nlp/loaders/vocabulary.py: contains classes for vocabularies - transfer-nlp/loaders/vectorizers.py: classes for vectorizers - transfer-nlp/loaders/loaders.py: classes for dataset loaders

transfer-nlp/models/: contains implementations of NLP models

transfer-nlp/embeddings: contains utility functions for embeddings management

transfer-nlp/experiments: each experiment is defined as a json config file, defining the whole experiment

transfer-nlp/runners: contains the full training pipeline, given a config file experiment

Some objectves to reach: - Unit-test everything - Smooth the runner pipeline to enable multi-task training (without constraining the way we do multi-task, whether linear, hierarchical or else…) - Include examples using state of the art pre-trained models - Enable slack integration for model crashing / completion - Enable embeddings visualisation (see this project https://projector.tensorflow.org/) - Enable pre-trained models finetuning - Include linguistic properties to models - Experiment with RL for sequential tasks - Include probing tasks to try to understand the properties that are learned by the models

This library builds on the book <cite>[“Natural Language Processing with PyTorch”](https://www.amazon.com/dp/1491978236/)<cite> by Delip Rao and Brian McMahan for the initial experiments.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.1.6

May 28, 2020

0.1.5

Jun 25, 2019

0.1.4

May 29, 2019

0.1.3

May 29, 2019

0.1.2

May 28, 2019

0.1.1

May 28, 2019

0.1

May 28, 2019

This version

0.0.2

Apr 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transfer_nlp-0.0.2.tar.gz (2.3 kB view hashes)

Uploaded Apr 6, 2019 Source

Hashes for transfer_nlp-0.0.2.tar.gz

Hashes for transfer_nlp-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`207d949adea2ddbea79d20128cde42adfbd8104fc5e0db5503a3e8fd913a1df5`
MD5	`9050c1146f3a59e030b521acde676a26`
BLAKE2b-256	`e287db65dd78497fe511da96d184140b119768bd8689abe826eb05fc00bff7a5`