skip to navigation
skip to content

Not Logged In

django-scraper 0.3.8

Django application for collecting online content following user-defined instructions

django-scraper is a Django application for collecting online content following user-defined instructions

Features

  • Extract content of given online website/pages and stored under JSON data
  • Crawl then extract content in multiple pages, with given depth.
  • Can download media files present in page
  • Have option for storing data under ZIP file
  • Support standard file system and AWS S3 storage
  • Customisable crawling requests for different scenarios
  • Process can be started from Django management command (~cron job) or with Python code
  • Support extracting multiple content (text, html, images, binary files) in the same page
  • Have content refinement (replacement) rules and black words filtering
  • Support custom proxy servers, and user-agents

Support Django 1.6, 1.7, and 1.8

Samples

Below is sample result from scraping https://news.ycombinator.com/ask

Installation

This application requires some other tools installed first:

lxml
requests

django-scraper installation can be made using pip:

pip install django-scraper

For more and latest information about configuration or usage, please visit the repository in github: https://github.com/zniper/django-scraper

Support

If you have any questions about this application, please email to: me@zniper.net

 
File Type Py Version Uploaded on Size
django-scraper-0.3.8.tar.gz (md5) Source 2015-05-26 61KB
  • Downloads (All Versions):
  • 13 downloads in the last day
  • 189 downloads in the last week
  • 929 downloads in the last month