Skip to main content

Screen scraping and web crawling framework

Project description

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

  • Pure python

  • Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)

  • Supports one file applications; Pomps doesn’t force a specific project layout or other restrictions.

  • Pomp is a meta framework like Paste: you may use it to create your own scraping framework.

  • Extensible networking: you may use any sync or async method.

  • No parsing libraries in the core; use you preferred approach.

  • Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

  • redirects

  • proxies

  • caching

  • database integration

  • cookies

  • authentication

  • etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Continuous integration status by drone.io:

Latest CI test codecov

PyPI status:

Latest PyPI version Number of PyPI downloads Have wheel License

Docs status:

Documentation Status

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page