Skip to main content

Gathers unstructured News data into a SQLite3 db

Project description

Gathers unstructured News data and commits it to a SQLite3 database.

News is valuable when it contains actionable information. Collecting the news for text analytics should be easy; even if the news is unstructured data. The goal for GatherNews is to quickly and simply gather News data. We want to know who, what, when, where, why, and how. We want it in a SQL database.

GatherNews allows you to specify which News sites you want to capture by providing the RSS link in “feeds_list.txt” like this:

http://feeds.reuters.com/Reuters/worldNews
http://rss.cnn.com/rss/money_latest.rss

You can then gather the news using these methods:

>>> # Create new tables if any new RSS feed addresses have been added
>>> capture_feeds.create_tables()
>>> # Populate all tables with RSS news feeds
>>> capture_feeds.populate_db()
>>> # Remove duplicate entries
>>> capture_feeds.rm_duplicates()

The examples folder contains working code for each module.

Features

  • Parses RSS feeds and commits each news article to SQLite3 database

  • Works around URL Encode Errors

Installation

To install GatherNews use pip:

$ pip install gathernews

Documentation

Documentation is available at https://github.com/Bonza-Times/GatherNews/wiki

Contribute

  1. Issue tracker is here: https://github.com/Bonza-Times/GatherNews/issues

  2. Feel free to email tylers.pile@gmail.com about anything.

  3. Fork it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GatherNews-0.1.0.tar.gz (4.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page