Skip to main content

A collection of measures for Approximate Functional Dependencies in relational data.

Project description

AFD comparative study

This repository contains all artifacts to "Approximately Measuring Functional Dependencies: a Comparative Study".

Overview

  • code: this directory holds the code used to generate the results in the paper
    • afd_measures: all Python source code relating to the implemented AFD measures
    • experiments: Jupyter notebooks containing the processing steps to generate the results, figures or tables in the paper
    • synthetic_data: all Python source code relating to the synthetic data generation process
  • data: the datasets used in the paper
    • rwd: manually annotated dataset of files found on the web (see data/ground_truth.csv)
    • rwd_e: datasets from rwd with errors introduced into them. Generated by the notebook code/experiments/create_rwd_e_dataset.ipynb.
    • syn_e: synthetic dataset generated focussing on errors. Generated by the notebook code/experiments/create_syn_e.ipynb
    • syn_u: synthetic dataset generated focussing on left-hand side uniqueness. Generated by the notebook code/experiments/create_syn_u.ipynb
    • syn_s: synthetic dataset generated focussing on right-hand side skewness. Generated by the notebook code/experiments/create_syn_s.ipynb
  • paper: A full version of the paper including all proofs.
  • results: results of applying the AFD measures to the datasets.

Installation

Use the code in this repository with Poetry or Conda.

Poetry

Install all dependencies via Poetry and start Jupyter lab to investigate the code.

$ poetry install
$ jupyter lab

Conda

Create a new environment from the conda_environment.yaml file, activate it and run Jupyter lab to investigate the code.

$ conda create -f conda_environment.yaml
$ jupyter lab

Dataset References

In addition to this repository, we made our benchmark also available on Zenodo: find it here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afd_measures-0.9.1.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

afd_measures-0.9.1-py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page