Skip to main content

Data loaders and abstractions for text and NLP

Project description

LanguageFlow

https://img.shields.io/pypi/v/languageflow.svg https://img.shields.io/pypi/pyversions/languageflow.svg https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg https://img.shields.io/travis/undertheseanlp/languageflow.svg Documentation Status

Data loaders and abstractions for text and NLP

Requirements

Install dependencies

$ pip install future, tox
$ pip install python-crfsuite==0.9.5
$ pip install Cython
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ pip install xgboost==0.82

Installation

$ pip install languageflow

Components

  • Transformers: NumberRemover, CountVectorizer, TfidfVectorizer

  • Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF

Data

Download a dataset using download command

$ languageflow download DATASET

List all dataset

$ languageflow list

Datasets

The datasets module currently contains:

  • Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*

  • Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC

  • Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG

Caution (*): With closed license dataset, you must provide URL to download

Example

Download UTS2017_BANK dataset

$ languageflow download UTS2017_BANK

Use UTS2017_BANK dataset

>>> from languageflow.data_fetcher import DataFetcher, NLPData
>>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
>>> print(corpus)
CategorizedCorpus: 1780 train + 197 dev + 494 test sentences

History

1.1.7 (2018-04-12)

  • Automatic deploy with travis and pypi

  • Fix dependencies hell

1.1.6 (2017-12-26)

  • Add data module to handle data downloading and data preprocessing

  • Add many new models: SGDClassifier, XGBoostClassier, FastTextClassifier, CRF

  • Add new feature: LanguageBoard

  • Automatic continuous integration with travis-ci

  • Build docs with readthedocs.org

1.1.5 (2017-12-11)

  • Refactor project to integrate with underthesea experiment

0.1.0 (2017-09-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

languageflow-1.1.13.tar.gz (481.3 kB view hashes)

Uploaded Source

Built Distribution

languageflow-1.1.13-py2.py3-none-any.whl (457.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page