peerannot

Crowdsourcing library

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3.8
- Python :: 3.9
Topic

Project description

A Python library for managing and learning from crowdsourced labels in image classification tasks—

The peerannot library was created to handle crowdsourced labels in classification problems.

Install

To install peerannot, simply run

pip install peerannot

Otherwise, a setup.cfg file is located at the root directory. Installing the library gives access to the Command Line Interface using the keyword peerannot in a bash terminal. Try it out using:

peerannot --help

Quick start

Our library comes with files to download and install standard datasets from the crowdsourcing community. Those are located in the datasets folder

peerannot install ./datasets/cifar10H/cifar10h.py

Running aggregation strategies

In python, we can run classical aggregation strategies from the current dataset as follows

for strat in ["MV", "NaiveSoft", "DS", "GLAD", "WDS"]:
    ! peerannot aggregate . -s {strat}

This will create a new folder names labels containing the labels in the labels_cifar10H_${strat}.npy file.

Training your network

Once the labels are available, we can train a neural network with PyTorch as follows. In a terminal:

for strat in ["MV", "NaiveSoft", "DS", "GLAD", "WDS"]:
    ! peerannot train . -o cifar10H_${strat} \
                -K 10 \
                --labels=./labels/labels_cifar-10h_${strat}.npy \
                --model resnet18 \
                --img-size=32 \
                --n-epochs=1000 \
                --lr=0.1 --scheduler -m 100 -m 250 \
                --num-workers=8

End-to-end strategies

Finally, for the end-to-end strategies using deep learning (as CoNAL or CrowdLayer), the command line is:

peerannot aggregate-deep . -o cifar10h_crowdlayer \
                     --answers ./answers.json \
                     --model resnet18 -K=10 \
                     --n-epochs 150 --lr 0.1 --optimizer sgd \
                     --batch-size 64 --num-workers 8 \
                     --img-size=32 \
                     -s crowdlayer

For CoNAL, the hyperparameter scaling can be provided as -s CoNAL[scale=1e-4].

Peerannot and the crowdsourcing formatting

In peerannot, one of our goals is to make crowdsourced datasets under the same format so that it is easy to switch from one learning or aggregation strategy without having to code once again the algorithms for each dataset.

So, what is a crowdsourced dataset? We define each dataset as:

dataset
├── train
│     ├── ...
│     ├── data as imagename-<key>.png
│     └── ...
├── val
├── test
├── dataset.py
├── metadata.json
└── answers.json

The crowdsourced labels for each training task are contained in the anwers.json file. They are formatted as follows:

{
    0: {<worker_id>: <label>, <another_worker_id>: <label>},
    1: {<yet_another_worker_id>: <label>,}
}

Note that the task index in the answers.json file might not match the order of tasks in the train folder… Thence, each task’s name contains the associated votes file index. The number of tasks in the train folder must match the number of entry keys in the answers.json file.

The metadata.json file contains general information about the dataset. A minimal example would be:

{
    "name": <dataset>,
    "n_classes": K,
    "n_workers": <n_workers>,
}

Create you own dataset

The dataset.py is not mandatory but is here to facilitate the dataset’s installation procedure. A minimal example:

class mydataset:
    def __init__(self):
        self.DIR = Path(__file__).parent.resolve()
        # download the data needed
        # ...

    def setfolders(self):
        print(f"Loading data folders at {self.DIR}")
        train_path = self.DIR / "train"
        test_path = self.DIR / "test"
        valid_path = self.DIR / "val"

        # Create train/val/test tasks with matching index
        # ...

        print("Created:")
        for set, path in zip(
            ("train", "val", "test"), [train_path, valid_path, test_path]
        ):
            print(f"- {set}: {path}")
        self.get_crowd_labels()
        print(f"Train crowd labels are in {self.DIR / 'answers.json'}")

    def get_crowd_labels(self):
        # create answers.json dictionnary in presented format
        # ...
        with open(self.DIR / "answers.json", "w") as answ:
            json.dump(dictionnary, answ, ensure_ascii=False, indent=3)

Project details

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3.8
- Python :: 3.9
Topic

Release history Release notifications | RSS feed

This version

0.0.1.post41

May 2, 2024

0.0.1.post40

Apr 22, 2024

0.0.1.post39

Feb 19, 2024

0.0.1.post38

Feb 19, 2024

0.0.1.post37

Jan 19, 2024

0.0.1.post36

Jan 18, 2024

0.0.1.post35

Jan 18, 2024

0.0.1.post34

Nov 21, 2023

0.0.1.post33

Nov 13, 2023

0.0.1.post32

Nov 2, 2023

0.0.1.post31

Oct 3, 2023

0.0.1.post30

Oct 3, 2023

0.0.1.post29

Sep 27, 2023

0.0.1.post28

Sep 13, 2023

0.0.1.post27

Sep 13, 2023

0.0.1.post26

Aug 22, 2023

0.0.1.post24

Jul 24, 2023

0.0.1.post23

Jun 19, 2023

0.0.1.post22

Jun 19, 2023

0.0.1.post21

Jun 19, 2023

0.0.1.post19

Mar 31, 2023

0.0.1.post18

Mar 1, 2023

0.0.1.post17

Feb 2, 2023

0.0.1.post16

Feb 2, 2023

0.0.1.post15

Feb 2, 2023

0.0.1.post13

Jan 4, 2023

0.0.1.post12

Dec 27, 2022

0.0.1.post11

Dec 26, 2022

0.0.1.post10

Dec 23, 2022

0.0.1.post9

Dec 20, 2022

0.0.1.post8

Dec 20, 2022

0.0.1.post7

Dec 16, 2022

0.0.1.post6

Dec 15, 2022

0.0.1.post5

Dec 15, 2022

0.0.1.post4

Dec 15, 2022

0.0.1.post3

Dec 12, 2022

0.0.1.post2

Dec 12, 2022

0.0.1.post1

Dec 12, 2022

0.0.1

Dec 12, 2022

0.0.0

Nov 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peerannot-0.0.1.post41.tar.gz (45.8 kB view hashes)

Uploaded May 2, 2024 Source

Built Distribution

peerannot-0.0.1.post41-py3-none-any.whl (65.3 kB view hashes)

Uploaded May 2, 2024 Python 3

Hashes for peerannot-0.0.1.post41.tar.gz

Hashes for peerannot-0.0.1.post41.tar.gz
Algorithm	Hash digest
SHA256	`025bcb60fa7cd4a5bc8f7ccfb03efe5179f066db0cf37cdcb1f4893b56d69946`
MD5	`dac841e5205fec7dd811f9b49f23c7fb`
BLAKE2b-256	`1dfffe6cb74f785d8dff5bd3451f9bf6f27ea500ce0e416b507d5052ae0a26b0`

Hashes for peerannot-0.0.1.post41-py3-none-any.whl

Hashes for peerannot-0.0.1.post41-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c629ddfb333b53f3722e037f5c45b56d794a61402c5b01211dc7b62a4032692`
MD5	`d906e6cf3367c8544ff5c17155039778`
BLAKE2b-256	`afddd505a8db42708e91123a82a3cb2a21139760795a9f20f6b494f4ebddb126`