Skip to main content

Real-time sentiment analysis for twitter.

Project description

.. raw:: html

<p align="center">

.. raw:: html

</p>

Introduction
============

Tweetfeels relies on `VADER sentiment
analysis <https://github.com/cjhutto/vaderSentiment>`__ to provide
sentiment scores to user-defined topics. It does this by utilizing
Twitter's streaming API to listen to real-time tweets around a
particular topic. Some possible applications for this include: \*
Calculating the social sentiment of particular political figures or
issues and analyzing scores across geographic regions. \* Calculating
sentiment scores for brands. \* Using sentiment scores as training
features for a learning algorithm to determine stock buy and sell
triggers. \* And more!

Install Methods
===============

1. The easiest way is to install from PyPI:

::

> pip3 install tweetfeels

2. If you've installed from PyPI and want to upgrade:

::

> pip3 install --upgrade tweetfeels

3. You can also install by cloning this repo:

::

> git clone https://github.com/uclatommy/tweetfeels.git
> cd tweetfeels
> python3 setup.py install

Additional Requirements
-----------------------

1. You will need to obtain Twitter OAuth keys and supply them to
tweetfeels in order to connect to Twitter's streaming API. Go
`here <https://twittercommunity.com/t/how-to-get-my-api-key/7033>`__
for instructions on how to obtain your keys.

2. Minimum python version of 3.6

3. If for some reason pip did not install the vader lexicon:

::

> python3 -m nltk.downloader vader_lexicon

Examples
========

*Note: Authorization keys in the examples are masked for privacy.*

For all examples, we use a few common boilerplate lines:

.. code:: python

from tweetfeels import TweetFeels

consumer_key = '*************************'
consumer_secret = '**************************************************'
access_token = '**************************************************'
access_token_secret = '*********************************************'
login = [consumer_key, consumer_secret, access_token, access_token_secret]

Stream tweets related to keyword "Trump" for 10 seconds, then calculate a sentiment score for the last 10 seconds.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

>>> trump_feels = TweetFeels(login, tracking=['trump'])
>>> trump_feels.start(10)
Timer completed. Disconnecting now...
>>> trump_feels.sentiment.value
-0.0073007430343252711

Stream tweets continuously and print current sentiment score every 10 seconds
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

>>> from threading import Thread
>>> import time
>>>
>>> def print_feels(seconds=10):
... while go_on:
... time.sleep(seconds)
... print(f'[{time.ctime()}] Sentiment Score: {trump_feels.sentiment.value}')
...
>>> go_on = True
>>> t = Thread(target=print_feels)
>>> trump_feels.start()
>>> t.start()
[Mon Feb 20 23:42:02 2017] Sentiment Score: -0.010528112416665309
[Mon Feb 20 23:42:13 2017] Sentiment Score: -0.007496043169013409
[Mon Feb 20 23:42:25 2017] Sentiment Score: -0.015294713038619036
[Mon Feb 20 23:42:36 2017] Sentiment Score: -0.030362951884842962
[Mon Feb 20 23:42:48 2017] Sentiment Score: -0.042087318872206333
[Mon Feb 20 23:42:59 2017] Sentiment Score: -0.041308681936680865
[Mon Feb 20 23:43:10 2017] Sentiment Score: -0.056203371039128994
[Mon Feb 20 23:43:22 2017] Sentiment Score: -0.07374769163753854
[Mon Feb 20 23:43:34 2017] Sentiment Score: -0.09549338153348486
[Mon Feb 20 23:43:46 2017] Sentiment Score: -0.10943157911799692
[Mon Feb 20 23:43:57 2017] Sentiment Score: -0.1406756546353098
[Mon Feb 20 23:44:08 2017] Sentiment Score: -0.12366467180485821
[Mon Feb 20 23:44:20 2017] Sentiment Score: -0.14460675229624026
[Mon Feb 20 23:44:32 2017] Sentiment Score: -0.13149386547613803
[Mon Feb 20 23:44:43 2017] Sentiment Score: -0.14568801433828418
[Mon Feb 20 23:44:55 2017] Sentiment Score: -0.14505295656838593
[Mon Feb 20 23:45:06 2017] Sentiment Score: -0.12853750933261338
[Mon Feb 20 23:45:17 2017] Sentiment Score: -0.11649611157554504
[Mon Feb 20 23:45:29 2017] Sentiment Score: -0.11382260762980569
[Mon Feb 20 23:45:40 2017] Sentiment Score: -0.11121839471955856
[Mon Feb 20 23:45:52 2017] Sentiment Score: -0.11083390577340985
[Mon Feb 20 23:46:03 2017] Sentiment Score: -0.10879727669948112
[Mon Feb 20 23:46:15 2017] Sentiment Score: -0.10137079133168492
[Mon Feb 20 23:46:26 2017] Sentiment Score: -0.10075971619875508
[Mon Feb 20 23:46:38 2017] Sentiment Score: -0.1194907722483259
[Mon Feb 20 23:46:49 2017] Sentiment Score: -0.1328795394197093
[Mon Feb 20 23:47:01 2017] Sentiment Score: -0.13734346200202507
[Mon Feb 20 23:47:12 2017] Sentiment Score: -0.1157629833027525
[Mon Feb 20 23:47:24 2017] Sentiment Score: -0.11030256885649424
[Mon Feb 20 23:47:35 2017] Sentiment Score: -0.12185876174059834
[Mon Feb 20 23:47:47 2017] Sentiment Score: -0.11323251979604802
[Mon Feb 20 23:47:58 2017] Sentiment Score: -0.11307793897469191
>>> trump_feels.stop()

**Note:** Trump is an extremely high volume topic. We ran this for
roughly 6 minutes and gathered nearly 15,000 tweets! For lower volume
topics, you may want to poll the sentiment value less frequently than
every 10 seconds.

Stream tweets continuously for another topic and save to a different database.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

>>> tesla_feels = TweetFeels(login, tracking=['tesla', 'tsla', 'gigafactory', 'elonmusk'], db='tesla.sqlite')
>>> t = Thread(target=print_feels, args=(tesla_feels, 120))
>>> tesla_feels.start()
>>> t.start()
[Mon Feb 20 17:39:15 2017] Sentiment Score: 0.03347735418362685
[Mon Feb 20 17:41:15 2017] Sentiment Score: 0.09408120307200825
[Mon Feb 20 17:43:15 2017] Sentiment Score: 0.12554072120979093
[Mon Feb 20 17:45:16 2017] Sentiment Score: 0.12381491277579157
[Mon Feb 20 17:47:16 2017] Sentiment Score: 0.17121666657137832
[Mon Feb 20 17:49:16 2017] Sentiment Score: 0.22588283902409384
[Mon Feb 20 17:51:16 2017] Sentiment Score: 0.23587583668725887
[Mon Feb 20 17:53:16 2017] Sentiment Score: 0.2485916177213093

Use the sentiments generator to replay captured data and plot
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

import pandas as pd
from datetime import timedelta, datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data1 = {s.end: s.value for s in tesla_feels.sentiments(delta_time=timedelta(minutes=15), nans=True)}
data2 = {s.end: s.volume for s in tesla_feels.sentiments(delta_time=timedelta(minutes=15), nans=True)}
df1 = pd.DataFrame.from_dict(data1, orient='index')
df2 = pd.DataFrame.from_dict(data2, orient='index')
fig, axes = plt.subplots(nrows=2, ncols=1)
fig.set_size_inches(15, 5)
plt.subplot(211).axes.get_xaxis().set_visible(False)
df1[0].plot(kind='line', title='Tesla Sentiment')
plt.subplot(212)
df2[0].plot(kind='area', title='Volume')

Methodology
===========

There are a multitude of ways in which you could combine hundreds or
thousands of tweets across time in order to calculate a single sentiment
score. One naive method might be to bin tweets into discretized
time-boxes. For example, perhaps you average the individual sentiment
scores every 10 seconds so that the current sentiment is the average
over the last 10 seconds. In this method, your choice of discretization
length is arbitrary and will have an impact on the perceived variance of
the score. It also disregards any past sentiment calculations.

To correct for these effects, we time-box every minute by default and do
not discard the sentiment from prior calculations. Instead, we phase out
older tweet sentiments geometrically as we add in new tweets:

.. figure:: http://chart.apis.google.com/chart?cht=tx&chl=S_%7Bt%7D=%5calpha%7BS_%7Bt-1%7D%7D%2B(1-%5calpha)s_t
:alt: f1

f1

Where |f2| is the aggregate sentiment at time t, |f3| is the sentiment
score for the current time-box, and |f5| is the fall-off factor between
0 and 1. We start the calculation with |f4|, which is why you will see
the sentiment score move away from zero until it stabilizes around the
natural value. Within each time-box we are using a weighted average of
sentiment scores. For each tweet, we utilize the associated user's
followers and friends count as the measure of influence.

Some tweets will also have a neutral score (0.0). In these cases, we
exclude it from aggregation.

Here's an example of different model parameterizations of real-time
Tesla sentiment:

Caveats
-------

The trained dataset that comes with
`vaderSentiment <https://github.com/cjhutto/vaderSentiment>`__ is
optimized for social media, so it can recognize the sentiment embedded
in neologisms, internet shorthand, and even emoticons. However, it can
only measure the aggregate sentiment value of a sentence or group of
words. It does not measure whether or not a tweet agrees or disagrees
with a particular ideology, political figure, or party. Although it is
generally true that statements of disagreement will tend to have a
negative sentiment. As an illustration, have a look at a few sentiment
scores from the trump dataset:

+--------+-------+-------+
| | Senti | Tweet |
| | ment | |
+========+=======+=======+
| 1 | -0.51 | RT |
| | 06 | @TEN\ |
| | | _GOP: |
| | | BREAK |
| | | ING: |
| | | Massi |
| | | ve |
| | | riots |
| | | happe |
| | | ning |
| | | now |
| | | in |
| | | Swede |
| | | n. |
| | | Stock |
| | | holm |
| | | in |
| | | flame |
| | | s. |
| | | Trump |
| | | was |
| | | right |
| | | again |
| | | ! |
+--------+-------+-------+
| 2 | -0.87 | RT |
| | 44 | @kurt |
| | | eiche |
| | | nwald |
| | | : |
| | | Intel |
| | | shows |
| | | our |
| | | ally, |
| | | Swede |
| | | n, |
| | | has |
| | | no |
| | | rise |
| | | in |
| | | crime |
| | | . |
| | | Trump |
| | | saw |
| | | on |
| | | Fox |
| | | it |
| | | does. |
| | | So he |
| | | ignor |
| | | es |
| | | intel |
| | | , |
| | | attac |
| | | ks |
| | | our |
| | | ally. |
| | | ht… |
+--------+-------+-------+
| 3 | 0.700 | RT |
| | 3 | @NoBo |
| | | omGam |
| | | ing: |
| | | I'm a |
| | | glass |
| | | half |
| | | full |
| | | kind |
| | | of |
| | | guy. |
| | | Now |
| | | that |
| | | Trump |
| | | won, |
| | | think |
| | | of |
| | | all |
| | | the |
| | | new |
| | | memes |
| | | we'll |
| | | have |
| | | over |
| | | the |
| | | next |
| | | four |
| | | years |
| | | ! |
+--------+-------+-------+
| 4 | 0.624 | RT |
| | 9 | @Sand |
| | | raTXA |
| | | S: |
| | | Nikki |
| | | Haley |
| | | is |
| | | kicki |
| | | ng |
| | | a$$ |
| | | at |
| | | the |
| | | UN👊💥💥 |
| | | Trump |
| | | made |
| | | a |
| | | great |
| | | choic |
| | | e |
| | | for |
| | | envoy |
| | | to |
| | | the |
| | | UN!! |
| | | #Isra |
| | | el |
| | | #MAGA |
+--------+-------+-------+

The first tweet is clearly voicing support for Donald Trump yet we get a
negative score. The second tweet is clearly in opposition and it also
produces a very negative sentiment. The fourth tweet is a case of
sentiment aligning with approval. Clearly, sentiment scores should not
be confused with ideological alignment or approval because it can go
both ways! You can approve and make a negative comment and you can
disapprove and make a positive sounding comment! Don't even get me
started on sarcastic tweets (see third one).

Sentiment scores tend to be more meaningful to non-ideological topics
such as products and services. For example, here are some tweets from
the Tesla dataset:

+--------+-------+-------+
| | Senti | Tweet |
| | ment | |
+========+=======+=======+
| 1 | -0.29 | Tesla |
| | 6 | is |
| | | ‘ille |
| | | gally |
| | | selli |
| | | ng |
| | | cars’ |
| | | in |
| | | Conne |
| | | cticu |
| | | t, |
| | | says |
| | | Deale |
| | | rship |
| | | Assoc |
| | | iatio |
| | | n |
| | | as |
| | | they |
| | | try |
| | | to |
| | | stop |
| | | direc |
| | | t-sal |
| | | e |
| | | bill |
+--------+-------+-------+
| 2 | -0.58 | Super |
| | 59 | charg |
| | | er |
| | | Realt |
| | | ime |
| | | Avail |
| | | abili |
| | | ty |
| | | Map |
| | | is |
| | | offli |
| | | ne |
| | | until |
| | | furth |
| | | er |
| | | notic |
| | | e. |
| | | I am |
| | | no |
| | | longe |
| | | r |
| | | recei |
| | | ving |
| | | data |
| | | as |
| | | Tesla |
| | | asked |
| | | for |
| | | it to |
| | | be |
| | | cut |
| | | off. |
+--------+-------+-------+
| 3 | 0.585 | Elon |
| | 9 | Musk |
| | | Steps |
| | | Forwa |
| | | rd |
| | | To |
| | | Help |
| | | Tesla |
| | | Drive |
| | | r |
| | | Who |
| | | Sacri |
| | | ficed |
| | | Car |
| | | To |
| | | Save |
| | | Strok |
| | | e |
| | | Victi |
| | | m |
| | | via |
| | | @aplu |
| | | sapp |
+--------+-------+-------+
| 4 | 0.440 | RT |
| | 4 | @Elec |
| | | trekC |
| | | o: |
| | | Tesla |
| | | Model |
| | | 3: |
| | | alumi |
| | | num |
| | | part |
| | | suppl |
| | | ier |
| | | annou |
| | | nces |
| | | inves |
| | | tment |
| | | to |
| | | incre |
| | | ase |
| | | outpu |
| | | t |
| | | ahead |
| | | of |
| | | Model |
| | | 3 |
| | | produ |
| | | ction |
| | | … |
+--------+-------+-------+

.. |f2| image:: http://chart.apis.google.com/chart?cht=tx&chl=S_t
.. |f3| image:: http://chart.apis.google.com/chart?cht=tx&chl=s_t
.. |f5| image:: http://chart.apis.google.com/chart?cht=tx&chl=%5calpha
.. |f4| image:: http://chart.apis.google.com/chart?cht=tx&chl=S_0=0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetfeels-0.4.0.tar.gz (21.0 kB view hashes)

Uploaded Source

Built Distribution

tweetfeels-0.4.0-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page