vantetider-scraper

A scraper of statistical data from Vantetider.se built on top of Statscraper.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

This is a scraper for statistical data from http://www.vantetider.se built on top of the Statscraper package <https://github.com/jplusplus/statscraper>.

Install

pip install -r requirements.txt

The scraper has to do a lot of requests and uses requests-cache <https://pypi.python.org/pypi/requests-cache> to store queries.

Example usage

from vantetider import VantetiderScraper

scraper = VantetiderScraper()
scraper.items  # List _implemeted_ datasets
# [<VantetiderDataset: VantatKortareAn60Dagar (Väntat kortare än 60 dagar )>, <VantetiderDataset: Overbelaggning (Överbeläggningar)>, <VantetiderDataset: PrimarvardTelefon (Telefontillgänglighet)>, <VantetiderDataset: PrimarvardBesok (Läkarbesök)>, <VantetiderDataset: SpecialiseradBesok (Förstabesök)>, <VantetiderDataset: SpecialiseradOperation (Operation/åtgärd)>]

dataset = scraper.get("Overbelaggning")  # Get a specific dataset

# List all available dimensions
print dataset.dimensions

print datatset.regions  # List available region
print datatset.years  # List available years

# Make a query, you have to explicitly define all dimension values you want
# to query. By default the scraper will fetch default values.
res = dataset.fetch({
  "region": "Blekinge",
  "year": "2016",
  "period": "Februari",
  # Currenty we can only query by id of dimension value
  "type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"
  })

# Do something with the result
df = res.pandas

Practical application, using dataset.py for storege.

from vantetider import VantetiderScraper
from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS
import dataset

db = dataset.connect('sqlite:///vantetider.db')

TOPIC = "Overbelaggning"

# Set up local db
table = db.create_table(TOPIC)
scraper = VantetiderScraper()

dataset = scraper.get(TOPIC)

# Get all available regions and years for query
years = [x.value for x in dataset.years]
regions = [x.value for x in dataset.regions]

# Query in chunks to be able to store to database on the run
for region in regions:
    for year in years:
        res = dataset.fetch({
            "year": year,
            "type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],
            "period": PERIODS,
            "region": region,
            })
        df = res.pandas
        data = res.list_of_dicts
        table.insert_many(data)

TODO

Implement scraping of “Aterbesok”, “Undersokningar”, “BUPdetalj”, “BUP”.
Enable querying on label names on all dimensions
Add more allowed values to vantetider/allowed_values.py
Make requests-cache optional.

Devlop

Run tests:

make tests

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Feb 8, 2021

0.1.9

Dec 2, 2020

0.1.8

Mar 13, 2019

0.1.7

Jan 25, 2019

0.1.6

May 3, 2018

This version

0.1.5

Feb 14, 2018

0.1.4

Nov 6, 2017

0.1.3

Nov 6, 2017

0.1.2

Nov 6, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vantetider_scraper-0.1.5.tar.gz (10.5 kB view hashes)

Uploaded Feb 14, 2018 Source

Hashes for vantetider_scraper-0.1.5.tar.gz

Hashes for vantetider_scraper-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`f00335237404151b98eb922903b61c7034b904c1598ac214eb4bcefbc5b67e73`
MD5	`a2c4790897e64d522d0f22f190507789`
BLAKE2b-256	`74092e2c2ebe723103bebd03a40455e513edf0cef404a05bfe64e9c76bfcf0f2`