Skip to main content

ORM-like package for defining, loading, and validating table schemas in pandas.

Project description

https://img.shields.io/pypi/v/table_enforcer.svg https://img.shields.io/travis/xguse/table_enforcer.svg Documentation Status

Demo Usage

Have a look at this Demo Notebook

Description

A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.

This is a very young attempt at solving a recurrent problem many people have. So far I have looked at multiple solutions, but none really did it for me.

I need to load, recode, and validate tables all day everyday. Sometimes its simple; you can pandas.read_table() and all is good. But sometimes you have a 400 column long RedCap data dump that is complicated af and you need to develop your recoding logic through an iterative process.

This is an attempt to apply a sort of “test driven development” approach to data cleaning.

Basic Workflow

  1. For each column that you care about in your source table:

    1. Define a Column object that represents the ideal state of your data by passing a list of small, independent, reusable validator functions and some descriptive information.

    2. Use this object to validate the column data from your source table.

      • It WILL fail.

    3. Add small, composable, reusable recoding functions to the column object and iterate until your validations pass.

  2. Define an Enforcer object by passing it a list of your column representation objects.

  3. This enforcer can be used to recode or validate recoded tables of the same kind as your source table wherever your applications use that type of data.

Please take a look and offer thoughts/advice.

Features

  • Enforcer and Column classes to define what columns should look like in a table.

  • Small but growing cadre of built-in validator functions and decorators.

  • Decorators for use in defining parameterized validators like between_4_and_60().

  • Declaration syntax for Enforcer is loosely based on SqlAlchemy’s Table pattern.

Credits

This package was created with Cookiecutter and the xguse/cookiecutter-pypackage project template which is based on audreyr/cookiecutter-pypackage.

History

v0.1.2 / 2017-11-17

  • flake8

  • set up basic testing

  • changed travis build settings

  • updated usage demo and readme

v0.1.1 / 2017-11-16

  • Added usage notebook link to docs.

  • reorganized import strategy of Enforcer/Column objs

  • added more builtin validators/recoders/decorators

  • updated reqs

  • initialized travis integration

  • updated docs

  • Added usage demo notebook for docs

  • updated ignore patterns

  • validators.py: renamed

v0.1.0 / 2017-11-15

  • first minimally functional package

  • Enforcer and Column classes defined and operational

  • small cadre of built-in validator functions and decorators

  • ignore jupyter stuff

  • linter setups

v0.0.1 / 2017-11-14

  • First commit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_enforcer-0.1.2.tar.gz (16.6 kB view hashes)

Uploaded Source

Built Distribution

table_enforcer-0.1.2-py2.py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page