Skip to main content

NLP library designed for flexible research and development

Project description

# transfer-nlp

This library is a playground NLP library, built on top of Pytorch. The goal is to gradually build a design that enable researchers and engineers to quickly implement new ideas, train NLP models and serve them in production.

You can have an overview of the high-level API on this [Colab Notebook](https://colab.research.google.com/drive/1DtC31eUejz1T0DsaEfHq_DOxEfanmrG1#scrollTo=Xzu3HPdGrnza), which shows how to use the framework on several examples. All examples on these notebooks embed in-cell Tensorboard training monitoring!

The ideal use of this library is to provide a minimal implementation of a dataset loader, a vectorizer and a model. Then, given a config file with the experiment parameters, runner.py takes care of the training pipeline.

Before starting using this repository:

  • create a virtual environment: mkvirtualenv YourEnvName

  • clone the repository: git clone https://github.com/feedly/transfer-nlp.git

  • Install requirements: pip install -r requirements.txt

Structure of the library:

loaders - transfer-nlp/loaders/vocabulary.py: contains classes for vocabularies - transfer-nlp/loaders/vectorizers.py: classes for vectorizers - transfer-nlp/loaders/loaders.py: classes for dataset loaders

transfer-nlp/models/: contains implementations of NLP models

transfer-nlp/embeddings: contains utility functions for embeddings management

transfer-nlp/experiments: each experiment is defined as a json config file, defining the whole experiment

transfer-nlp/runners: contains the full training pipeline, given a config file experiment

Some objectves to reach: - Unit-test everything - Smooth the runner pipeline to enable multi-task training (without constraining the way we do multi-task, whether linear, hierarchical or else…) - Include examples using state of the art pre-trained models - Enable slack integration for model crashing / completion - Enable embeddings visualisation (see this project https://projector.tensorflow.org/) - Enable pre-trained models finetuning - Include linguistic properties to models - Experiment with RL for sequential tasks - Include probing tasks to try to understand the properties that are learned by the models

This library builds on the book <cite>[“Natural Language Processing with PyTorch”](https://www.amazon.com/dp/1491978236/)<cite> by Delip Rao and Brian McMahan for the initial experiments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transfer_nlp-0.0.2.tar.gz (2.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page