A light-weight and flexible validation package for pandas data structures.
Project description
Pandera
A flexible and expressive pandas validation library.
Why?
pandas
data structures hide a lot of information, and explicitly
validating them at runtime in production-critical or reproducible research
settings is a good idea. pandera
enables users to:
- Check the types and properties of columns in a
DataFrame
or values in aSeries
. - Perform more complex statistical validation like hypothesis testing.
- Seamlessly integrate with existing data analysis/processing pipelines via function decorators.
pandera
provides a flexible and expressive API for performing data validation
on tidy (long-form) and wide data to make data processing pipelines more
readable and robust.
Documentation
The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io
Install
Using pip:
pip install pandera
Using conda:
conda install -c cosmicbboy pandera
Example Usage
DataFrameSchema
import pandas as pd
from pandera import Column, DataFrameSchema, Float, Int, String, Check
# validate columns
schema = DataFrameSchema({
# the check function expects a series argument and should output a boolean
# or a boolean Series.
"column1": Column(Int, Check(lambda s: s <= 10)),
"column2": Column(Float, Check(lambda s: s < -1.2)),
# you can provide a list of validators
"column3": Column(String, [
Check(lambda s: s.str.startswith("value_")),
Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
]),
})
# alternatively, you can pass strings representing the legal pandas datatypes:
# http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes
schema = DataFrameSchema({
"column1": Column("int64", Check(lambda s: s <= 10)),
...
})
df = pd.DataFrame({
"column1": [1, 4, 0, 10, 9],
"column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
"column3": ["value_1", "value_2", "value_3", "value_2", "value_1"]
})
validated_df = schema.validate(df)
print(validated_df)
# column1 column2 column3
# 0 1 -1.3 value_1
# 1 4 -1.4 value_2
# 2 0 -2.9 value_3
# 3 10 -10.1 value_2
# 4 9 -20.4 value_1
Tests
pip install pytest
pytest tests
Contributing to pandera
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
A detailed overview on how to contribute can be found in the contributing guide on GitHub.
Issues
Go here to submit feature requests or bugfixes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.