skip to navigation
skip to content

Parker 0.7.3

A web spider for collecting specific data across a set of configured sites

Latest Version: 0.9.6

Parker is a Python-based web spider for collecting specific data across a set of configured sites.

Non-Python requirements:

  • Redis - for task queuing and visit tracking
  • libxml - for HTML parsing of pages


Install using pip:

$ pip install parker


To configure Parker, you will need to install the configuration files in a suitable location for the user running Parker. To do this, use the parker-config script. For example:

$ parker-config ~/.parker

This will install the configuration in your homedir and will output the related environment variable for you to set in your .bashrc.



  • Patch to fix an issue where the consumer was overlooking media URIs that start with / and are therefore relative to the base_uri configuration.
  • Added boto to the requirements for future use.


  • Patch to fix an issue where the crawler was overlooking URIs that start with / and are therefore relative to the base_uri configuration.


  • Patch to fix an issue where, if class is not present in the site config, the path includes “None”.


  • Rework the client to allow for improved proxy failover should we need it. Improve testing a little to back this up.
  • Add tagging to the configuration. These are simply passed through to the resulting JSON objects output by the model so that you can tag them with whatever you want.
  • Add classification to the configuration. Again this is passed through, but is also used in the output file path from the consumer worker.


  • Add tracking of visited URIs as well as page hashes to the crawl worker. Use that to reduce the number of URIs added to the crawl queue.


  • Fix an issue with the order of key-value reference resolution that prevented the effective use of unique_field if using a field that was a kv_ref.
  • Add some Parker specific configuration so we can specify where to download, in case the PROJECT env variable doesn’t exist.


  • Update ConsumeModel to post process the data. This enables us to populate specific data from a reference to a key-value field.
  • Reorder changes so newest first, and rename to “Changes” in the long description.


  • Bug fix to fix RST headers which may be the problem.
  • Remove the decode/encode which is not the issue.


  • Bug fix to see if RST in ASCII fixes issues on PyPI.


  • Added handling for a PARKER_CONFIG environment variable, allowing users to specify where configuration files are loaded from.
  • Added the parker-config script to install default configuration files to a passed location. Also prints out an example PARKER_CONFIG environment variable to add to your profile files.
  • Updated documentation to use proper reStructuredText files.
  • Add a CHANGES file to track updates.
File Type Py Version Uploaded on Size
Parker-0.7.3-py2.py3-none-any.whl (md5) Python Wheel 2.7 2014-09-01 18KB
Parker-0.7.3.tar.gz (md5) Source 2014-09-01 136KB