PyHCUP

Python tools working with data from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyHCUP is a Python library for parsing and importing data obtained from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).

In particular, most of the data provided by HCUP is in fixed-width text (ASCII or *.asc) files, with meta data available in separate load files. This library is built to use the SAS format load files (*.sas).

Example Usage

Load a datafile/loadfile combination.

import pyhcup

#specify where your data and loadfiles live
datafile = 'D:\\Users\\hcup\\sid\\NY_SID_2009_CORE.asc'
loadfile = 'D:\\Users\\hcup\\sid\\sasload\\NY_SID_2009_CORE.sas'

#pull basic meta from SAS loadfile
meta_df = pyhcup.meta_from_sas(loadfile)

#use meta knowledge to parse datafile into a pandas DataFrame
df = pyhcup.read(datafile, meta_df)

Deal with very large files that cannot be held in memory in two ways.

To import a subset of rows, such as for preliminary work or troubleshooting, specify nrows to read and/or skiprows to skip using sas.df_from_sas().

#optionally specify nrows and/or skiprows to handle larger files
df = pyhcup.read(datafile, meta_df, nrows=500000, skiprows=1000000)

To iterate through chunks of rows, such as for importing into a database, first use the metadata to build lists of column names and widths. Next, pass a chunksize to the df_from_sas() function above to create a generator yielding manageable-sized chunks.

chunk_size = 500000
reader = pyhcup.read(datafile, meta_df, chunksize=chunk_size)
for df in reader:
    #do your business
    #such as replacing sentinel values (below)
    #or inserting into a database with another Python library

Whether you are pulling in all records or just a chunk of records, you can also replace all those pesky missing/invalid data placeholders from HCUP (this is less useful for generically parsing missing values for non-HCUP files).

#also, this bulldozes through all values in all columns with no per-column control
replaced = pyhcup.replace_sentinels(df)

Project details

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.6.4

Aug 16, 2015

0.1.6.3.9

Aug 10, 2015

0.1.6.3.8

Feb 9, 2015

0.1.6.3.7

Jan 5, 2015

0.1.6.3.6dev pre-release

Jun 13, 2014

0.1.6.3.5dev pre-release

Jun 5, 2014

0.1.6.3.4

May 29, 2014

0.1.6.3.3dev pre-release

May 16, 2014

0.1.6.3.2dev pre-release

May 15, 2014

0.1.6.3.1dev pre-release

Apr 29, 2014

0.1.6.3dev pre-release

Apr 28, 2014

0.1.6.2.3dev pre-release

Apr 22, 2014

0.1.6.2.1dev pre-release

Apr 22, 2014

0.1.6.2dev pre-release

Apr 17, 2014

0.1.6.1

Apr 14, 2014

0.1.6.0

Feb 18, 2014

0.1.5.9dev pre-release

Feb 3, 2014

0.1.5.7

Jan 7, 2014

This version

0.1.5.7dev pre-release

Dec 31, 2013

0.1.5.6

Dec 26, 2013

0.1.5.5dev pre-release

Dec 23, 2013

0.1.5.4dev pre-release

Dec 18, 2013

0.1.5.3dev pre-release

Dec 18, 2013

0.1.5.2dev pre-release

Dec 11, 2013

0.1.5.1dev pre-release

Dec 11, 2013

0.1.5

Dec 9, 2013

0.1.5dev pre-release

Dec 9, 2013

0.1.4dev pre-release

Dec 3, 2013

0.1.3

Nov 27, 2013

0.1.2.3

Nov 26, 2013

0.1.2.2

Nov 25, 2013

0.1.2.1

Nov 25, 2013

0.1.2

Nov 25, 2013

0.1.2dev pre-release

Nov 25, 2013

0.1.1

Nov 20, 2013

0.1.1dev pre-release

Nov 20, 2013

0.1.0dev pre-release

Nov 20, 2013

PyHCUP 0.1.5.7dev

Navigation

Verified details

Maintainers

Unverified details

GitHub Statistics

Meta

Classifiers

Project description

Example Usage

Project details

Verified details

Maintainers

Unverified details

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed