Skip to main content

Probabilistic type inference

Project description

build-publish on release build on develop PyPI version Documentation status Downloads Binder

1 Introduction

Type inference refers to the task of inferring the data type (e.g., Boolean, date, integer and string) of a given column of data, which becomes challenging in the presence of missing data and anomalies.

https://raw.githubusercontent.com/alan-turing-institute/ptype/release/notes/motivation.png

Normal, missing and anomalous values are denoted by green, yellow and red, respectively in the right hand figure.

ptype is a probabilistic type inference model for tabular data, which aims to robustly infer the data type for each column in a table of data. By taking into account missing data and anomalies, ptype improves over the existing type inference methods. This repository provides an implementation of ptype in Python.

If you use this package, please cite ptype with the following BibTeX entry:

@article{ceritli2020ptype,
  title={ptype: probabilistic type inference},
  author={Ceritli, Taha and Williams, Christopher KI and Geddes, James},
  journal={Data Mining and Knowledge Discovery},
  year={2020},
  volume = {34},
  number = {3},
  pages={870–-904},
  doi = {10.1007/s10618-020-00680-1},
}

2 Install requirements

pip install -r requirements.txt

3 Usage

See demo notebooks in notebooks folder. View them online via Binder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptype-0.2.7.tar.gz (23.8 kB view hashes)

Uploaded Source

Built Distribution

ptype-0.2.7-py3-none-any.whl (26.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page