Easy Peasy Language Squeezy

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

NLPeasy

Build NLP pipelines the easy way

Disclaimer: This is in Alpha stage, lot of things can go wrong. It could possibly mess with your docker containers and change your Elasticsearch Data!

Also the API is very instable and even the name NLPeasy might soon change.

Free software: Apache Software License 2.0

Installation

Prerequisites:

Python 3 (we use Python 3.7)
Elastic: Several possibilities
- Have Docker installed - needs to have the docker package installed (see below).
- Install and start Elasticsearch and Kibana: https://www.elastic.co/downloads/ or https://www.elastic.co/downloads/elasticsearch-oss (pure Apache licensed version)
- Use any running Elasticsearch and Kibana (on premise or cloud)...
Pretrained Models: See below for Spacy Language Models and WordVectors

It is recommended to use a virtual environment:

cd $PROJECT_DIR
python -m venv venv
source venv/bin/activate

The source statement has to be repeated whenever you open a new terminal.

Then install

pip install nlpeasy

Or the development version from GitHub:

pip install --upgrade git+https://github.com/d-one/nlpeasy

If you want to use spaCy language models download them (90-200 MB), e.g.

python -m spacy download en_core_web_md
# and/or
python -m spacy download de_core_news_md

If you want to use pretrained FastText-Wordvectors (each ~7GB):

curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip
curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.de.zip

If you want to use Jupyter, install it to the virtual environment:

pip install jupyterlab

Development

To install this module in Dev-mode, i.e. change files and reload module:

git clone https://github.com/d-one/nlpeasy
cd nlpeasy

It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate

Install the version in edit mode:

pip install -e .

In Jupyter you can have reloaded code when you change the files as in:

%load_ext autoreload
%autoreload 2

Usage

import pandas as pd
import nlpeasy as ne

# connect to running elastic or else start an Open Source stack on your docker
elk = ne.connect_elastic(dockerPrefix='nlp', elkVersion='7.4.0', mountVolumePrefix=None)
# If it is started on docker it will on the first time pull the images (1.3GB)!
# BTW, this function is not blocking, i.e. the servers might only be active couple of seconds later.
# Setting mountVolumePrefix="./elastic-data/" would keep the data of elastic in your
# filesystems and then the data survives container restarts

# read data as Pandas data frame
nips = pd.read_pickle("data_raw/nips.pickle")

# setup stages in the NLP pipeline and set textfields
pipeline = ne.Pipeline(index='nips', textCols=['message','title'], dateCol='year', elk=elk)

pipeline += ne.RegexTag(r'\$([^$]+)\$', ['message'], 'math')
pipeline += ne.VaderSentiment('message', 'sentiment')
pipeline += ne.SpacyEnrichment(cols=['message','title'])

# do the pipeline
nips_enriched = pipeline.process(nips, writeElastic=True)

# Create Kibana Dashboard of all the columns
pipeline.create_kibana_dashboard()

# open Kibana in webbrowser
elk.show_kibana()

Features

Pandas based pipeline
Support for any extensions - now includes some for Regex, spaCy, VaderSentiment
Write results to ElasticSearch
Automatic Kibana dashboard generation
Have Elastic started in Docker if it is not installed locally or remotely
Apache License 2.0

Credits

This package was created with Cookiecutter and the [audreyr/cookiecutter-pypackage]https://github.com/audreyr/cookiecutter-pypackage project template.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.0

Nov 24, 2019

0.6.2

Oct 13, 2019

This version

0.6.1

Oct 11, 2019

0.6.0

Oct 10, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpeasy-0.6.1.tar.gz (22.5 kB view hashes)

Uploaded Oct 11, 2019 Source

Hashes for nlpeasy-0.6.1.tar.gz

Hashes for nlpeasy-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`23843df46d03e7e480676e3c872515d68d93a693cb57e3cbd7ea6b339bdb4442`
MD5	`ccad599cc5ff48bb736e184bb18678a1`
BLAKE2b-256	`6628dc0af2361fbe3fa6df92075b754affd5c4db3ab8f7b73eb841f5a2de4f6f`