Implementation of several preprocessing techniques for Association Rule Mining (ARM)

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

About

arm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretization for Association Rule Mining (ARM). The design of this framework has a minimalistic design outlook in mind and is intended to be fully extensible and allow easy integration with other related ARM libraries, e.g., NiaARM.

Key features

Loading various formats of datasets (CSV, JSON, TXT)
Converting datasets to different formats
Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text)
Dataset identification (which type of dataset)
Dataset statistics
Discretisation methods
Data squashing methods

Usage

Data loading

The following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the examples/data_loading directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('path/to/datasets', format='csv')

# Load dataset
dataset.load_data()
df = dataset.data

Data discretisation

The following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the examples/discretisation directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load_data()

# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])

Data squashing

The following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the examples/squashing directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
dataset.load()

# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')

Missing values

The following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the examples/missing_values directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
dataset.load()

# Impute missing data
dataset.missing_values(method='impute')

Related frameworks

[1] NiaARM: A minimalistic framework for Numerical Association Rule Mining

References

[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, Data squashing as preprocessing in association rule mining, 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.

License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.2

Apr 2, 2024

0.2.1

Feb 19, 2024

0.2.0

Jan 31, 2024

0.1.1

Jan 17, 2024

This version

0.1.0

Jan 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arm_preprocessing-0.1.0.tar.gz (6.8 kB view hashes)

Uploaded Jan 5, 2024 Source

Built Distribution

arm_preprocessing-0.1.0-py3-none-any.whl (8.2 kB view hashes)

Uploaded Jan 5, 2024 Python 3

Hashes for arm_preprocessing-0.1.0.tar.gz

Hashes for arm_preprocessing-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`53b2def1f9fd5bf7d3d656725c6b253df1c3f385dfbccf17e544e96d6bfc390e`
MD5	`ab0ab76ed093ae8eff66219e678e986a`
BLAKE2b-256	`598ae6183c35ce570f19d28a5bd93d470f8d3a35d6bf0f26fba9a5ac23d335c4`

Hashes for arm_preprocessing-0.1.0-py3-none-any.whl

Hashes for arm_preprocessing-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9dadefa795beb7f4b83b05e41e610a45059f92cbc180b368d675e346a66dd24d`
MD5	`709a806c251ba43410fcd3a2a92b73ee`
BLAKE2b-256	`f36803624e63e6b7eb4484a96ee13f3a60832bbc2dfe70907cecca8786beea3c`