pyCANON, A Python library to check the level of anonymity of a dataset
Project description
pyCANON is a Python library and CLI to assess the values of the parameters associated with the most common privacy-preserving techniques via anonymization.
Authors: Judith Sáinz-Pardo Díaz and Álvaro López García (IFCA - CSIC).
Installation
We recommend to use Python3 with virtualenv:
virtualenv .venv -p python3 source .venv/bin/activate
Then run the following command to install the library and all its requirements:
pip install pycanon
Documentation
The pyCANON documentation is hosted on Read the Docs.
Getting started
Example using the adult dataset:
import pandas as pd
from pycanon import anonymity, report
FILE_NAME = "adult.csv"
QI = ["age", "education", "occupation", "relationship", "sex", "native-country"]
SA = ["salary-class"]
DATA = pd.read_csv(FILE_NAME)
# Calculate k for k-anonymity:
k = anonymity.k_anonymity(DATA, QI)
# Print the anonymity report:
report.print_report(DATA, QI, SA)
Description
pyCANON allows to check if the following privacy-preserving techniques are verified and the value of the parameters associated with each of them.
Technique |
pyCANON function |
Parameters |
Notes |
---|---|---|---|
k-anonymity |
k_anonymity |
k: int |
|
(α, k)-anonymity |
alpha_k_anonymity |
α: float k:int |
|
ℓ-diversity |
l_diversity |
ℓ: int |
|
Entropy ℓ-diversity |
entropy_l_diversity |
ℓ: int |
|
Recursive (c,ℓ)-diversity |
recursive_c_l_diversity |
c: int ℓ: int |
Not calculated if ℓ=1 |
Basic β-likeness |
basic_beta_likeness |
β: float |
|
Enhanced β-likeness |
enhanced_beta_likeness |
β: float |
|
t-closeness |
t_closeness |
t: float |
For numerical attributes the definition of the EMD (one-dimensional Earth Mover’s Distance) is used. For categorical attributes, the metric “Equal Distance” is used. |
δ-disclosure privacy |
delta_disclosure |
δ: float |
More information can be found in this paper.
Citation
If you are using pyCANON you can cite it as follows:
@article{sainzpardo2022pycanon, title={A Python library to check the level of anonymity of a dataset}, author={S{\'a}inz-Pardo D{\'\i}az, Judith and L{\'o}pez Garc{\'\i}a, {\'A}lvaro}, journal={Scientific Data}, volume={9}, number={1}, pages={785}, year={2022}, publisher={Nature Publishing Group UK London}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.