textblob

Simple, Pythonic text processing. Sentiment analysis, POS tagging, noun phrase parsing, and more.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

TextBlob: Simplified Text Processing

Homepage: https://textblob.readthedocs.org/

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

from textblob import TextBlob

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

blob = TextBlob(text)
blob.tags           # [(u'The', u'DT'), (u'titular', u'JJ'),
                    #  (u'threat', u'NN'), (u'of', u'IN'), ...]

blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

for sentence in blob.sentences:
    print(sentence.sentiment.polarity)
# 0.060
# -0.341

blob.translate(to="es")  # 'La amenaza titular de The Blob...'

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

Features

Noun phrase extraction
Part-of-speech tagging
Sentiment analysis
Classification (Naive Bayes, Decision Tree)
Language translation and detection powered by Google Translate
Tokenization (splitting text into words and sentences)
Word and phrase frequencies
Parsing
n-grams
Word inflection (pluralization and singularization) and lemmatization
Spelling correction
JSON serialization
Add new models or languages through extensions
WordNet integration

Get it now

$ pip install -U textblob
$ python -m textblob.download_corpora

Examples

See more examples at the Quickstart guide.

Documentation

Full documentation is available at https://textblob.readthedocs.org/.

Requirements

Python >= 2.6 or >= 3.3

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.8.4 (2014-02-02)

Fix display (__repr__) of WordList slices on Python 3.
Add download_corpora module. Corpora must now be downloaded using python -m textblob.download_corpora.

0.8.3 (2013-12-29)

Sentiment analyzers return namedtuples, e.g. Sentiment(polarity=0.12, subjectivity=0.34).
Memory usage improvements to NaiveBayesAnalyzer and basic_extractor (default feature extractor for classifiers module).
Add textblob.tokenizers.sent_tokenize and textblob.tokenizers.word_tokenize convenience functions.
Add textblob.classifiers.MaxEntClassifer.
Improved NLTKTagger.

0.8.2 (2013-12-21)

Fix bug in spelling correction that stripped some punctuation (Issue #48).
Various improvements to spelling correction: preserves whitespace characters (Issue #12); handle contractions and punctuation between words. Thanks @davidnk.
Make TextBlob.words more memory-efficient.
Translator now sends POST instead of GET requests. This allows for larger bodies of text to be translated (Issue #49).
Update pattern tagger for better accuracy.

0.8.1 (2013-11-16)

Fix bug that caused ValueError upon sentence tokenization. This removes modifications made to the NLTK sentence tokenizer.
Add Word.lemmatize() method that allows passing in a part-of-speech argument.
Word.lemma returns correct part of speech for Word objects that have their pos attribute set. Thanks @RomanYankovsky.

0.8.0 (2013-10-23)

Backwards-incompatible: Renamed package to textblob. This avoids clashes with other namespaces called text. TextBlob should now be imported with from textblob import TextBlob.
Update pattern resources for improved parser accuracy.
Update NLTK.
Allow Translator to connect to proxy server.
PerceptronTagger completely deprecated. Install the textblob-aptagger extension instead.

0.7.1 (2013-09-30)

Bugfix updates.
Fix bug in feature extraction for NaiveBayesClassifier.
basic_extractor is now case-sensitive, e.g. contains(I) != contains(i)
Fix repr output when a TextBlob contains non-ascii characters.
Fix part-of-speech tagging with PatternTagger on Windows.
Suppress warning about not having scikit-learn installed.

0.7.0 (2013-09-25)

Wordnet integration. Word objects have synsets and definitions properties. The text.wordnet module allows you to create Synset and Lemma objects directly.
Move all English-specific code to its own module, text.en.
Basic extensions framework in place. TextBlob has been refactored to make it easier to develop extensions.
Add text.classifiers.PositiveNaiveBayesClassifier.
Update NLTK.
NLTKTagger now working on Python 3.
Fix __str__ behavior. print(blob) should now print non-ascii text correctly in both Python 2 and 3.
Backwards-incompatible: All abstract base classes have been moved to the text.base module.
Backwards-incompatible: PerceptronTagger will now be maintained as an extension, textblob-aptagger. Instantiating a text.taggers.PerceptronTagger() will raise a DeprecationWarning.

0.6.3 (2013-09-15)

Word tokenization fix: Words that stem from a contraction will still have an apostrophe, e.g. "Let's" => ["Let", "'s"].
Fix bug with comparing blobs to strings.
Add text.taggers.PerceptronTagger, a fast and accurate POS tagger. Thanks @syllog1sm.
Note for Python 3 users: You may need to update your corpora, since NLTK master has reorganized its corpus system. Just run curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python again.
Add download_corpora_lite.py script for getting the minimum corpora requirements for TextBlob’s basic features.

0.6.2 (2013-09-05)

Fix bug that resulted in a UnicodeEncodeError when tagging text with non-ascii characters.
Add DecisionTreeClassifier.
Add labels() and train() methods to classifiers.

0.6.1 (2013-09-01)

Classifiers can be trained and tested on CSV, JSON, or TSV data.
Add basic WordNet lemmatization via the Word.lemma property.
WordList.pluralize() and WordList.singularize() methods return WordList objects.

0.6.0 (2013-08-25)

Add Naive Bayes classification. New text.classifiers module, TextBlob.classify(), and Sentence.classify() methods.
Add parsing functionality via the TextBlob.parse() method. The text.parsers module currently has one implementation (PatternParser).
Add spelling correction. This includes the TextBlob.correct() and Word.spellcheck() methods.
Update NLTK.
Backwards incompatible: clean_html has been deprecated, just as it has in NLTK. Use Beautiful Soup’s soup.get_text() method for HTML-cleaning instead.
Slight API change to language translation: if from_lang isn’t specified, attempts to detect the language.
Add itokenize() method to tokenizers that returns a generator instead of a list of tokens.

0.5.3 (2013-08-21)

Unicode fixes: This fixes a bug that sometimes raised a UnicodeEncodeError upon creating accessing sentences for TextBlobs with non-ascii characters.
Update NLTK

0.5.2 (2013-08-14)

Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.
Fix bug with computing start and end indices of sentences.

0.5.1 (2013-08-13)

Fix bug that disallowed display of non-ascii characters in the Python REPL.
Backwards incompatible: Restore blob.json property for backwards compatibility with textblob<=0.3.10. Add a to_json() method that takes the same arguments as json.dumps.
Add WordList.append and WordList.extend methods that append Word objects.

0.5.0 (2013-08-10)

Language translation and detection API!
Add text.sentiments module. Contains the PatternAnalyzer (default implementation) as well as a NaiveBayesAnalyzer.
Part-of-speech tags can be accessed via TextBlob.tags or TextBlob.pos_tags.
Add polarity and subjectivity helper properties.

0.4.0 (2013-08-05)

New text.tokenizers module with WordTokenizer and SentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob’s constructor. Tokens are accessed through the new tokens property.
New Blobber class for creating TextBlobs that share the same tagger, tokenizer, and np_extractor.
Add ngrams method.
Backwards-incompatible: TextBlob.json() is now a method, not a property. This allows you to pass arguments (the same that you would pass to json.dumps()).
New home for documentation: https://textblob.readthedocs.org/
Add parameter for cleaning HTML markup from text.
Minor improvement to word tokenization.
Updated NLTK.
Fix bug with adding blobs to bytestrings.

0.3.10 (2013-08-02)

Bundled NLTK no longer overrides local installation.
Fix sentiment analysis of text with non-ascii characters.

0.3.9 (2013-07-31)

Updated nltk.
ConllExtractor is now Python 3-compatible.
Improved sentiment analysis.
Blobs are equal (with ==) to their string counterparts.
Added instructions to install textblob without nltk bundled.
Dropping official 3.1 and 3.2 support.

0.3.8 (2013-07-30)

Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to noun_phrases (instead of training them every time you import TextBlob).
Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
NPExtractor and Tagger objects can be passed to TextBlob’s constructor.
Fix bug with POS-tagger not tagging one-letter words.
Rename text/np_extractor.py -> text/np_extractors.py
Add run_tests.py script.

0.3.7 (2013-07-28)

Every word in a Blob or Sentence is a Word instance which has methods for inflection, e.g word.pluralize() and word.singularize().
Updated the np_extractor module. Now has an new implementation, ConllExtractor that uses the Conll2000 chunking corpus. Only works on Py2.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.18.0.post0

Feb 15, 2024

0.18.0

Feb 15, 2024

0.17.1

Oct 22, 2021

0.17.0

Oct 22, 2021

0.15.3

Feb 24, 2019

0.15.2

Nov 21, 2018

0.15.1

Jan 20, 2018

0.15.0

Dec 2, 2017

0.14.0

Nov 21, 2017

0.13.1

Nov 11, 2017

0.13.0

Aug 16, 2017

0.12.0

Feb 27, 2017

0.11.1

Feb 18, 2016

0.11.0

Nov 1, 2015

0.10.0

Oct 4, 2015

0.9.1

Jun 10, 2015

0.9.0

Sep 15, 2014

This version

0.8.4

Feb 3, 2014

0.8.3

Dec 29, 2013

0.8.2

Dec 21, 2013

0.8.1

Nov 16, 2013

0.8.0

Oct 23, 2013

0.7.1

Sep 30, 2013

0.7.0

Sep 25, 2013

0.6.3

Sep 15, 2013

0.6.2

Sep 5, 2013

0.6.1

Sep 1, 2013

0.6.0

Aug 26, 2013

0.5.3

Aug 21, 2013

0.5.2

Aug 14, 2013

0.5.1

Aug 13, 2013

0.5.0

Aug 10, 2013

0.4.0

Aug 6, 2013

0.3.10

Aug 2, 2013

0.3.9

Jul 31, 2013

0.3.8

Jul 30, 2013

0.3.7

Jul 29, 2013

0.3.6

Jul 28, 2013

0.3.5

Jul 19, 2013

0.3.4

Jul 7, 2013

0.3.3

Jul 7, 2013

0.3.2

Jul 7, 2013

0.3.1

Jul 7, 2013

0.3.0

Jul 7, 2013

0.2.6

Jul 5, 2013

0.2.5

Jul 2, 2013

0.2.4

Jul 1, 2013

0.2.3

Jul 1, 2013

0.2.1

Jul 1, 2013

0.2.0

Jul 1, 2013

0.1.36

Jul 1, 2013

0.1.35

Jul 1, 2013

0.1.34

Jul 1, 2013

0.1.33

Jul 1, 2013

0.1.32

Jul 1, 2013

0.1.31

Jul 1, 2013

0.1.3

Jul 1, 2013

0.1.2

Jul 1, 2013

0.1

Jul 1, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-0.8.4.tar.gz (1.8 MB view hashes)

Uploaded Feb 3, 2014 Source

Built Distribution

textblob-0.8.4-py2.py3-none-any.whl (2.0 MB view hashes)

Uploaded Feb 3, 2014 Python 2 Python 3

Hashes for textblob-0.8.4.tar.gz

Hashes for textblob-0.8.4.tar.gz
Algorithm	Hash digest
SHA256	`8da25dc77a1da8faea3a72775fe4cdd2f095eef45b83ba06e6026524a685d6f4`
MD5	`c170abbbb950ac2f838e3a852a9c5c77`
BLAKE2b-256	`07cfa13616e46cdddd97c499c33bc223ce6ba43c8c23e36bfd6219e976ad2ec1`

Hashes for textblob-0.8.4-py2.py3-none-any.whl

Hashes for textblob-0.8.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`009a604580f0c579a422f13a43422e8331e26ea522e8f724b6e8079f8cd10d8c`
MD5	`66006c8a73b6a6a4e71f1c951f58d2d8`
BLAKE2b-256	`c9485a1a090791e573573815b6d67b47f47623a9375c1996e3c27f6886a93b81`

textblob 0.8.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

TextBlob: Simplified Text Processing

Features

Get it now

Examples

Documentation

Requirements

License

Changelog

0.8.4 (2014-02-02)

0.8.3 (2013-12-29)

0.8.2 (2013-12-21)

0.8.1 (2013-11-16)

0.8.0 (2013-10-23)

0.7.1 (2013-09-30)

0.7.0 (2013-09-25)

0.6.3 (2013-09-15)

0.6.2 (2013-09-05)

0.6.1 (2013-09-01)

0.6.0 (2013-08-25)

0.5.3 (2013-08-21)

0.5.2 (2013-08-14)

0.5.1 (2013-08-13)

0.5.0 (2013-08-10)

0.4.0 (2013-08-05)

0.3.10 (2013-08-02)

0.3.9 (2013-07-31)

0.3.8 (2013-07-30)

0.3.7 (2013-07-28)

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution