tweetvac

Package that makes sucking down tweets from Twitter easy.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Project description

tweetvac

Python package for sucking down tweets from Twitter. It implements Twitter’s guidelines for working with timelines so that you don’t have to.

tweetvac supports retrospective pulling of tweets from Twitter. For example, it can pull down a large number of tweets by a specific user or all the tweets from a geographic area that mentions a search term. It automatically generates the requests to work backward along the timeline.

Installation

Install tweetvac using pip:

$ pip install tweetvac

If cloning this repository, you need to install twython and its dependencies.

Authentication

Twitter requires OAuth. tweetvac can store a user’s authentication information in a configuration file for reuse.

Log into Twitter and open https://dev.twitter.com/apps.
Create a new application. The name needs to be unique across all Twitter apps. A callback is not needed.
Create an OAuth access token on your application web page.
Create a file called tweetvac.cfg and format it as follows:

[Auth]
consumer_key = Gx33LSA3IICoqqPoJOp9Q
consumer_secret = 1qkKAljfpQMH9EqDZ8t50hK1HbahYXAUEi2p505umY0
oauth_token = 14574199-4iHhtyGRAeCvVzGpPNz0GLwfYC54ba3sK5uBl4hPe
oauth_token_secret = K80YytdT9FRXEoADlVzJ64HDQEaUMwb37N9NBykCNw5gw

Alternatively, you can pass those four parameters as a tuple in the above order into the Tweetvac constructor rather than storing them in a configuration file.

The Basics

Import tweetvac

import tweetvac

Create a TweetVac instance

You can pass the OAuth parameters as a tuple:

vac = tweetvac.TweetVac((consumer_key, consumer_secret, oauth_token, oauth_token_secret))

or use the configuration object:

config = tweetvac.AuthConfig()
vac = tweetvac.TweetVac(config)

Suck down tweets

tweetvac expects a Twitter endpoint and a dictionary of parameters for that endpoint. Read the Twitter documentation for a list of endpoints and their parameters. It is recommended to set the count option in the params dict to the largest value supported by that endpoint.

params = {'screen_name': 'struckDC', 'count': 200}
data = vac.suck('statuses/user_timeline', params)

Work with the data

The data returned is a list of dicts. The fields in the dict are listed in the Twitter API documentation on the Tweet object.

The data can be converted back to json and stored to a file like this:

with open('data.json', 'w') as outfile:
    json.dump(data, outfile)

Advanced

Filtering the tweets

Twitter provides several parameters on each endpoint for selecting what tweets you want to retrieve. Additional culling is available by passing a list of filter functions.

def remove_mention_tweets(tweet):
    return not '@' in tweet['text']

data = vac.suck('statuses/user_timeline', params, filters=[remove_mention_tweets])

Return false from your function to remove the tweet from the list.

Turning off the vacuum

tweetvac will suck down tweets until you reach your rate limit or you consume all the available tweets. To stop sooner, you can pass a cutoff function that returns true when tweetvac should stop.

def stop(tweet):
    cutoff_date = time.strptime("Wed Jan 01 00:00:00 +0000 2014", '%a %b %d %H:%M:%S +0000 %Y')
    tweet_date = time.strptime(tweet['created_at'], '%a %b %d %H:%M:%S +0000 %Y')
    return tweet_date < cutoff_date

data = vac.suck('statuses/user_timeline', params, cutoff=stop)

You can also pass a hard limit to the number of requests to stop tweetvac early:

data = vac.suck('statuses/user_timeline', params, max_requests=10)

Twitter API

Supported Endpoints

statuses/user_timeline - tweets by the specified user.
statuses/home_timeline - tweets by those followed by the authenticating user.
statuses/mentions_timeline - tweets mentioning the authenticating user.
statuses/retweets_of_me - tweets that are retweets of the authenticating user.
search/tweets - search over tweets

The endpoints have different request rate limits, count limits per request, and total tweet count limits.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

1.0.1

Mar 1, 2020

1.0

Feb 13, 2020

0.3

Feb 6, 2014

0.2

Feb 6, 2014

0.1

Feb 6, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetvac-1.0.1.tar.gz (6.7 kB view hashes)

Uploaded Mar 1, 2020 Source

Built Distribution

tweetvac-1.0.1-py3-none-any.whl (6.6 kB view hashes)

Uploaded Mar 1, 2020 Python 3

Hashes for tweetvac-1.0.1.tar.gz

Hashes for tweetvac-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0267ff1c8229c9cbf596cb64befb888b8a014e068dfe14fa243f6d8fc55cd71e`
MD5	`fc99e4720daaec7bfc4fe4a4fa2e69fe`
BLAKE2b-256	`64d68a980b6daf7f08262ca8855bc8a8fb7a5cad8d3d866ae98ef6f6f68accdc`

Hashes for tweetvac-1.0.1-py3-none-any.whl

Hashes for tweetvac-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f0792e58286786b79e23aed0143287b0e54fb465095cb7aad7f5032286ef46e6`
MD5	`4ca4cfaa9a8d12f499d01520f04fd455`
BLAKE2b-256	`7be5a9fa984a6a82179e158a8e82ec5f5270eb5edbaebaac5e7e6e887e75242c`