Skip to main content

Envelope for archive.org API.

Project description

Wayback Machine

This project is an envelope for simple fetching of historical versions of page from archive.org API.

The page can be used for subsequent webscraping

Setup and usage

Install from pip with

pip install waybackmachine

Simple usage of the WaybackMachine class is as

from waybackmachine import WaybackMachine

url = "https://www.gov.pl/web/koronawirus/wykaz-zarazen-koronawirusem-sars-cov-2"
for response,version_time in WaybackMachine(url):
    # process response
    pass

The iterated version goes from newest to the older and older version all the way to end date at given step of date axis for querying the archive.

Returned are

  • response = response object from requests
  • version_time = datetime of the version

Update of package is done with

pip install --upgrade waybackmachine

Start, end and step configuration

Library enables setting of start date, end date and step size as timedelta.

Since iterating is done backwards in time, end date precedes start date!

Setting the querying for weekly from 1st May back to 1st February 2020 is done with

from datetime import datetime,timedelta
from waybackmachine import WaybackMachine

url = "https://www.liu.se/"
for response,version_time in WaybackMachine(url, start = datetime(2020,5,1), end = datetime(2020,2,1), step = timedelta(days = 7)):
    # process response
    pass

The date can be also specified one of following string formats:

  • %Y-%m-%d
  • %Y-%m-%d %H:%M
  • %Y-%m-%d %H:%M:%S
for response,version_time in WaybackMachine(url, start = "2020-05-01", end = "2020-02-01", step = timedelta(days = 7)):
    # process response
    pass

String representation of timedelta will be added.

Current version

If start is None, the looup will start with fetching current page (not from archive).

To avoid this behavior, manually set start = datetime.datetime.now().

Configurations

On frequent use-cases, custom configurations of parameters are added to the packages.

These consist of default parameter values.

So far following configurations are available:

  • default - start is now(), end is beginning of year of start (hence length can be 0 - 365 days), 1 day step
  • covid - start is now() (might be changed, if covid disappears), end is 2020-01-01, COVID-19 spread into the world after. In China the COVID-19 has already occurred before!. Step is 12 h.

Contribution

Developed by Martin Benes.

Join on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybackmachine-0.2.0.tar.gz (7.0 kB view hashes)

Uploaded Source

Built Distribution

waybackmachine-0.2.0-py3-none-any.whl (5.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page