Envelope for archive.org API.
Project description
Wayback Machine
This project is an envelope for simple fetching of historical versions of page from archive.org API.
The page can be used for subsequent webscraping
Setup and usage
Install from pip with
pip install waybackmachine
Simple usage of the WaybackMachine
class is as
from waybackmachine import WaybackMachine
url = "https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/bekraftade-fall-i-sverige/"
for response in WaybackMachine(url):
# process response
pass
In the code the requests are being done from the newest (to the url itself) and then back in history to older and older versions saved on archive.
Parameterization will be later broaden to be more general. Currently the project is used for fetching COVID-19 data.
pip install --upgrade waybackmachine
Parametrization
date
By default the start date is today
. End date is currently set to 2020-03-01
.
Date will be more general in the future.
from waybackmachine import WaybackMachine
url = "https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/bekraftade-fall-i-sverige/"
for response in WaybackMachine(url, "2020-04-01"): # start from 1st April 2020 and go back
# process response
pass
Contribution
Developed by Martin Benes.
Join on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for waybackmachine-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 978b61ed13a1b32650b8ed319a975830e26bef6a3190288bdcb9ed9b3dd6a349 |
|
MD5 | 043a6a09710ea9f25ea6b8a82878c83a |
|
BLAKE2b-256 | 1fc22d4573234e354ede5019c0a136fb8b6cb921b943b5a95b8f7792135e396a |