Skip to main content

Envelope for archive.org API.

Project description

Wayback Machine

This project is an envelope for simple fetching of historical versions of page from archive.org API.

The page can be used for subsequent webscraping

Setup and usage

Install from pip with

pip install waybackmachine

Simple usage of the WaybackMachine class is as

from waybackmachine import WaybackMachine

url = "https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/bekraftade-fall-i-sverige/"
for response in WaybackMachine(url):
    # process response
    pass

In the code the requests are being done from the newest (to the url itself) and then back in history to older and older versions saved on archive.

Parameterization will be later broaden to be more general. Currently the project is used for fetching COVID-19 data.

pip install --upgrade waybackmachine

Parametrization

date

By default the start date is today. End date is currently set to 2020-03-01.

Date will be more general in the future.

from waybackmachine import WaybackMachine

url = "https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/bekraftade-fall-i-sverige/"

for response in WaybackMachine(url, "2020-04-01"): # start from 1st April 2020 and go back
    # process response
    pass

Contribution

Developed by Martin Benes.

Join on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybackmachine-0.0.1.tar.gz (3.4 kB view hashes)

Uploaded Source

Built Distribution

waybackmachine-0.0.1-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page