Data stream generator for *Certainty-based Domain Selection Framework for TinyML Devices* paper.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Conditional Evidence Stream Generator

Data stream generator for Certainty-based Domain Selection Framework for TinyML Devices paper.

Installation guide

Installation is pretty simple. Either do it by make install in the main directory of this repository, or use pip for current stable version:

pip install cesg

Processing example

from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import balanced_accuracy_score
import cesg

# Define parameters
n_cycles = 3
n_chunks = 1000
chunk_size = 200
random_state = 1410
n_concepts = 500
modes = {
    'instant': {'mode': 'instant'},
    'linear': {'mode': 'linear'},
    'normal': {'mode': 'normal', 'sigma': 1},
}

# Prepare data
X, y = make_classification(n_samples=10000)

# Transform to components
X_pca = cesg.utils.normalized_PCA(X)

# Prepare factor
factor = cesg.utils.mix_to_factor(X_pca)

# Prepare condition map
condition_map = cesg.utils.make_condition_map(n_cycles=n_cycles,
                                   n_concepts=n_concepts,
                                   factor=factor,
                                   factor_range=(0.1,0.9))

# Calculate concept proba
concept_probabilities = cesg.concepts.concept_proba(n_concepts=n_concepts,
                            n_chunks=n_chunks,
                            normalize=True,
                            **modes['normal'])

# Initialize stream
stream = cesg.ConditionalEvidenceStream(X, y,
                                        condition_map.T,
                                        concept_probabilities,
                                        chunk_size=chunk_size,
                                        fragile=False,
                                        random_state=random_state)

# Iterate stream and report scores
clf = MLPClassifier()
scores = []

while chunk := stream.get_chunk():
    X, y = chunk

    if stream.chunk_idx > 1:
        y_pred = clf.predict(X)
        score = balanced_accuracy_score(y, y_pred)
        
        scores.append(score)
    
    clf.partial_fit(X, y, classes=stream.classes_)
    
print(scores)

Generation procedure

The streams were synthesized using an original generator based on the conditional evidence. At the input of the stream synthesis procedure, we have a stationary data set $DS$.

The first processing step is to determine the $F$ factor of the set, being a value in the range $0-1$, correlated with the difficulty of the object and determined for each object from the DS data set. To estmimate the $F$ factor:

Transform $DS$ to its components $DS'$, using Principal Component Analysis, leaving 80% of the explained variance and standardizing the result.
Model a Gaussian Mixture for $DS'$ with an assumption of 10 mixture components, assuming that each component has its own single variance.
Estimate the density of the Gaussian Mixture distribution for each point of DS'. It is important to remember that support is estimated for each component of the mixture.
Quantile-normalize the obtained density to a uniform distribution along the object axis -- independently in each component.
Flatten the obtained representation with the sum of components and perform another quantile normalization to uniform distribution, so that for each point from the original set its mapping to the $F$ factor is obtained.

Having the vector of factors $F$, it is possible to proceed to determine the conditional map $CM$. It informs generator about the availability of each $DS$ object for each metaconcept building the data stream. Here it is possible to configure the number of metaconcepts (m), the number of difficulty oscillation cycles (c) and the thresholding range of the difficulty factor (r). To obtain the conditional map $CM$:

Build a condition basis vector constituting an interval-normalized (0-1) sampling of the sinusoid at m points in the period from 0 to 2 Pi c. Scale the result to the thresholding range r.
Calculate a conditional map $CM$ by equating the condition basis vector to the vetor $F$, so as to obtain a logical matrix informing whether the F factor of a given object exceeds the metaconcept threshold value.

The final, third component of processing metadata is the metaconcept probability map (CP). It informs generator about the probability of selecting an object from a given metaconcept in a given batch of the generated stream. It is calculated according to instant, linear or normal dynamics, in accordance with the standard procedure for generating data streams.

To establish a data stream, it is necessary to pass $DS$, $F$, $CM$, and $CP$ to the ConditionalEvidenceStream control object. It is responsible for using in each subsequent batch only objects allowed for processing in accordance with the CM conditional map for a specific batch described in the CP.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.0

Jan 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cesg-1.0.0.tar.gz (7.1 kB view hashes)

Uploaded Jan 3, 2024 Source

Built Distribution

cesg-1.0.0-py3-none-any.whl (7.2 kB view hashes)

Uploaded Jan 3, 2024 Python 3

Hashes for cesg-1.0.0.tar.gz

Hashes for cesg-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`408d635dcc01a959483bfb3e27abdc02509dc206c931f8bf29d00500a819b806`
MD5	`a607e56d6ca65b6321cbd4e445ccec78`
BLAKE2b-256	`09c4876f2c461c5a0ada7bbe01b6aa266e8d70b8436063acfb09f361bb23226f`

Hashes for cesg-1.0.0-py3-none-any.whl

Hashes for cesg-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c261d9ae32523972203c99948accccc3c1d24d677899936987a841fc71e1223a`
MD5	`fc29782e9f0b6d306d06d5ff7e25f51b`
BLAKE2b-256	`9a5d0dbfecff341ed35ad4261726a357c1d130597894679c74383bfbbecd2c76`