Skip to main content

Django application which crawls and downloads online content following instructions

Project description

Features

  • Extract content of given online websites/pages using XPath queries.

  • Process can be started from command line (~cron job) or inside Django code

  • Can be called from command line (~cron job) or inside Django code

  • Automatically browse and download content in related pages, with given depth.

  • Support metadata extract along with other content

  • Have content refinement rules and black words filtering

  • Store and prevent duplication of downloaded content

  • Allow changing User Agent

  • Support proxy servers

Documentation

The full documentation is not ready yet, please go here for notes about installation and usage: https://github.com/zniper/django-scraper

Support

If you have any questions about this application, please email to me[at]zniper.net

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-scraper-0.2.2.tar.gz (9.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page