Skip to main content

Elegant tweet preprocessing

Project description

preprocessor

Preprocessor is a preprocessing library for tweet data written in Python.

When building Machine Learning systems based on tweet data, a preprocessing is required. This library makes it easy to clean, parse or tokenize the tweets.

Installation

using pip:

$ pip install tweet-preprocessor

Usage

import preprocessor as p
cleaned_tweet = p.clean("Preprocessor is #awesome https://github.com/s/preprocessor")

print cleaned_tweet
#Preprocessor is

tokenized_tweet = p.tokenize("Preprocessor is #awesome https://github.com/s/preprocessor")

print tokenized_tweet
#Preprocessor is $HASHTAG$ $URL$

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweet-preprocessor-0.1.2.tar.gz (2.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page