Skip to main content

A Twitter streaming library built on tweepy that enables dynamic term tracking

Project description

Twitter Monitor
===============

[![Build Status](https://travis-ci.org/michaelbrooks/twitter-monitor.svg?branch=master)](https://travis-ci.org/michaelbrooks/twitter-monitor)
[![Coverage Status](https://coveralls.io/repos/michaelbrooks/twitter-monitor/badge.svg?branch=master)](https://coveralls.io/r/michaelbrooks/twitter-monitor?branch=master)

A Twitter streaming library built on [Tweepy](https://github.com/tweepy/tweepy) that enables dynamic tracking
of the [filtered Twitter Streaming API](https://dev.twitter.com/docs/api/1.1/post/statuses/filter).

This library provides a framework that you can use to build your own dynamic Twitter term tracking system.
You will want to do three things:

1. Create a subclass of `TermChecker` that knows how to look for tracked terms (e.g. in a database or a file).
There is a `FileTermChecker` provided as an example.
2. Create a subclass of `JsonStreamListener` that does something interesting with the tweets. Maybe write tweets
to a file a database.
3. Start an instance of the `DynamicTwitterStream` class, which ties it all together.

There is also a `stream_tweets` script you can use to get started
streaming tweets more quickly. More information is [below](#streaming-script).


####Installation

This package is available on PyPI [here](https://pypi.python.org/pypi/twitter-monitor).

```bash
You can't use 'macro parameter character #' in math mode stream_tweets > tweets.json
```

The required settings can be provided via environment variables,
a `.ini` file, or command-line arguments.
The command-line arguments take precedent:

```bash
$ stream_tweets --api-key XXXX --api-secret XXXX \
--access-token XXXX --access-token-secret XXXX \
--track-file my/track/file.txt \
--poll-interval 15
```

The `--poll-interval` option defines how often to check the track file
for updated terms. You can also use the option `--unfiltered TRUE` to
enable capturing tweets without terms.

Alternatively, one or more of the options may be defined in a `.ini` file.
The script will search in the current directory for `twitter_monitor.ini`, but this can be overridden
using the `--ini-file` argument.
Below is an example `twitter_monitor.ini`:

```ini
[twitter]
api_key=XXXX
api_secret=XXXX
access_token=XXXX
access_token_secret=XXXX
track_file=my/track/file.txt
poll_interval=15
unfiltered=TRUE
```

If options are not defined on the command line or in an ini file,
environment variables are checked. Below are the names of the corresponding
environment variables:

```bash
TWITTER_API_KEY=XXXX
TWITTER_API_SECRET=XXXX
TWITTER_ACCESS_TOKEN=XXXX
TWITTER_ACCESS_TOKEN_SECRET=XXXX
TWITTER_TRACK_FILE=my/track/file.txt
TWITTER_POLL_INTERVAL=15
TWITTER_UNFILTERED=TRUE
```

Custom Usage
-------------

Below is a simple example of how to set up and initialize a dynamic Twitter stream.
This example uses the `FileTermChecker` and default `JsonStreamListener` implementations.
There is a working example in the `twitter_monitor/basic_stream.py` file.

```python
import tweepy
import twitter_monitor

# The file containing terms to track
terms_filename = "tracking_terms.txt"

# How often to check the file for new terms
poll_interval = 15

# Your twitter API credentials
api_key = 'YOUR API KEY'
api_secret = 'YOUR API SECRET'
access_token = 'YOUR ACCESS TOKEN'
access_token_secret = 'YOUR ACCESS TOKEN SECRET'

auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)

# Construct your own subclasses here instead
listener = twitter_monitor.listener.JsonStreamListener()
checker = twitter_monitor.checker.FileTermChecker(filename=terms_filename)

# Start and maintain the streaming connection...
stream = twitter_monitor.DynamicTwitterStream(auth, listener, checker)
while True:
try:
# Loop and keep reconnecting in case something goes wrong
# Note: You may annoy Twitter if you reconnect too often under some conditions.
stream.start_polling(poll_interval)
except Exception as e:
print e
time.sleep(1) # to avoid craziness with Twitter
```


### Checking for Terms

To create a custom `TermChecker`, you need to override the `update_tracking_terms(self)` method.
This method must return a *set* of terms. `update_tracking_terms()` will be called
on your checker periodically to refresh the term list.

The `twitter_monitor.checker.FileTermChecker` class is included as an example.

If you are not using filter terms, construct your DynamicTwitterStream
object with the `unfiltered` keyword argument set to True.

### Handling Tweets

The Twitter streaming API emits various types of messages.
The `JsonStreamListener` class includes stub methods for handling each of these.
Please refer to the [documentation](https://dev.twitter.com/docs/streaming-apis/messages) for more information
about what these messages mean.

Create a subclass of `JsonStreamListener`, overriding the handler methods for any message types you wish to respond to.
Here is a simple Listener that just prints out tweets:

```python
import twitter_monitor
import json

class PrintingListener(twitter_monitor.JsonStreamListener):

def on_status(self, status):
print json.dumps(status, indent=3)

def on_limit(self, track):
print "Horrors, we lost %d tweets!" % track
```

Note that the `on_exception()` handler is a bit different. It is called when there is some exception
from within the tweepy streaming thread. By default the exception will be stored in the `stream_exception` field
on your listener object.

More info about how listeners are used may be gleaned from the
[Tweepy source code](https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py#L22).

Questions and Contributing
--------------------------

Feel free to post questions and problems on the issue tracker. Pull requests welcome!

Use `python setup.py test` to run tests.


### Creating a release

1. Increment the version number in `setup.py`. Commit and push.
2. Create a new Release in GitHub with the appropriate version tag.
3. Run `setup.py sdist bdist` to build the distribution for PyPi.
4. Run `twine upload -u USERNAME -p PASSWORD dist/*` to upload to PyPi.
You must have [twine](https://github.com/pypa/twine) installed.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page