Skip to main content

Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam.

Project description

Introduction

This repository contains functions that will ease the use of Great Expectations. Users can input data and data quality rules and get results in return.

DISCLAIMER: Repo is in PoC phase

Getting Started

Run the following code in your workspace:

pip install great_expectations
pip install dq-suite-amsterdam
import dq_suite
  • Define 'dfs' as a list of dataframes that require a dq check
  • Define 'dq_rules' as a JSON as shown in dq_rules_example.json in this repo
results, brontabel_df, bronattribute_df, dqRegel_df = dq_suite.df_check(dfs, dq_rules, "showcase")

Known exceptions

The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster. Using a Shared Compute Cluster will results in an error, as it does not have the permissions that Great Expectations requires.

Updates

version = "0.1.0" : dq_rules_example.json is updated. Added: "dataframe_parameters": { "unique_identifier": "id" }

version = "0.2.0" : dq_rules_example.json is updated. Added for each tables: { "dataframe_parameters": [ { "unique_identifier": "id", "table_name": "well", "rules": [ { "rule_name": "expect_column_values_to_be_between", "parameters": [ { "column": "latitude", "min_value": 6, "max_value": 10000 } ] } ] }, ....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq-suite-amsterdam-0.2.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

dq_suite_amsterdam-0.2.1-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page