skip to navigation
skip to content

Not Logged In

Parker 0.7.1

A web spider for collecting specific data across a set of configured sites

Latest Version: 0.9.6

Parker is a Python-based web spider for collecting specific data across a set of configured sites.

Non-Python requirements:

  • Redis - for task queuing and visit tracking
  • libxml - for HTML parsing of pages

Installation

Install using pip:

$ pip install parker

Configuration

To configure Parker, you will need to install the configuration files in a suitable location for the user running Parker. To do this, use the parker-config script. For example:

$ parker-config ~/.parker

This will install the configuration in your homedir and will output the related environment variable for you to set in your .bashrc.

Changes

0.7.1

  • Patch to fix an issue where, if class is not present in the site config, the path includes “None”.

0.7.0

  • Rework the client to allow for improved proxy failover should we need it. Improve testing a little to back this up.
  • Add tagging to the configuration. These are simply passed through to the resulting JSON objects output by the model so that you can tag them with whatever you want.
  • Add classification to the configuration. Again this is passed through, but is also used in the output file path from the consumer worker.

0.6.0

  • Add tracking of visited URIs as well as page hashes to the crawl worker. Use that to reduce the number of URIs added to the crawl queue.

0.5.1

  • Fix an issue with the order of key-value reference resolution that prevented the effective use of unique_field if using a field that was a kv_ref.
  • Add some Parker specific configuration so we can specify where to download, in case the PROJECT env variable doesn’t exist.

0.5.0

  • Update ConsumeModel to post process the data. This enables us to populate specific data from a reference to a key-value field.
  • Reorder changes so newest first, and rename to “Changes” in the long description.

0.4.2

  • Bug fix to fix RST headers which may be the problem.
  • Remove the decode/encode which is not the issue.

0.4.1

  • Bug fix to see if RST in ASCII fixes issues on PyPI.

0.4.0

  • Added handling for a PARKER_CONFIG environment variable, allowing users to specify where configuration files are loaded from.
  • Added the parker-config script to install default configuration files to a passed location. Also prints out an example PARKER_CONFIG environment variable to add to your profile files.
  • Updated documentation to use proper reStructuredText files.
  • Add a CHANGES file to track updates.
 
File Type Py Version Uploaded on Size
Parker-0.7.1-py2.py3-none-any.whl (md5) Python Wheel 2.7 2014-07-31 18KB
Parker-0.7.1.tar.gz (md5) Source 2014-07-31 136KB
  • Downloads (All Versions):
  • 164 downloads in the last day
  • 854 downloads in the last week
  • 2877 downloads in the last month