Library to help analyze crowdsourcing results

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

crowdnalysis

Crowdsourcing Citizen Science projects usually require citizens to classify items (images, pdfs, songs,…) into one of a finite set of categories. Once an image is classified by different citizens, the different votes need to be aggregated to obtain a consensus classification. Usually this is done by selecting the most voted category. crowdnalysis allows Crowdsourcing Citizen Science projects to compute consensus that go beyond the selection of the most voted category, by computing a model of quality for each of the citizen scientist involved in the project. This more advanced consensus results in higher quality information for the Crowdsourcing Citizen Science project.

Implemented consensus algorithms

Majority Voting
Probabilistic
Multinomial
Dawid-Skene

In addition to the pure Python implementations above, the following models are implemented in the probabilistic programming language Stan by using the CmdStanPy interface:

Multinomial
Multinomial Eta
Dawid-Skene
Dawid-Skene Eta Hierarchical

~ Eta models impose that the probability of the labels are higher for the real classes in the error-rate (a.k.a. confusion) matrix.

Features

Import annotation data from a csv file with a preprocessing option
Calculate inter-rater reliability with different measures
Fit selected model to annotation data and compute the consensus
Compute the consensus with a fixed pre-determined set of parameters
Fit the model parameters provided that the consensus is already known
Given parameters of a generative models (Multinomial, Dawid-Skene), sample annotations, tasks, and workers (i.e., annotators)
Visualise the error-rate matrix for annotators
Conduct predictive analysis of the accuracy vs number of annotations for a set of models
Visualise the consensus on annotated images in HTML format

Quick start

crowdnalysis is distributed via PyPI: https://pypi.org/project/crowdnalysis/

Install as a standard Python package:

$ pip install crowdnalysis

CmdStanPy will be installed as a dependency, however, this package requires the installation of the CmdStan command-line interface too. This can be done via executing the install_cmdstan utility that comes with CmdStanPy. See the package docs for more information.

$ install_cmdstan

Use the package in the code:

import crowdnalysis

Check available consensus models:

print(crowdnalysis.factory.Factory.list_registered_algorithms())

How to run unit tests

We use pytest as the testing framework. Tests can be run by:

$ pytest

If you want to get the logs of the execution, do

$ pytest --log-cli-level 0

Logging

We use the standard logging library according to the rules here.

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

Acknowledgements

crowdnalysis is being developed within the Crowd4SDG project funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 872944.

Reference

For the details of the conceptual and mathematical model of crowdnalysis, see:

[1] Cerquides, J.; Mülâyim, M.O.; Hernández-González, J.; Ravi Shankar, A.; Fernandez-Marquez, J.L. A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data. Mathematics 2021, 9, 875, https://doi:10.3390/math9080875

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.2

Sep 15, 2022

1.1.1

Sep 15, 2022

1.1.0

Mar 1, 2022

1.0.2

Feb 24, 2022

1.0.1

Jan 25, 2022

0.1.5

Oct 21, 2021

0.1.4

Oct 15, 2021

0.1.3

Aug 2, 2021

0.1.2

Jul 6, 2021

0.1.1

Jun 18, 2021

This version

0.1.0

Jun 18, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowdnalysis-0.1.0.tar.gz (47.8 kB view hashes)

Uploaded Jun 18, 2021 Source

Built Distribution

crowdnalysis-0.1.0-py3-none-any.whl (63.1 kB view hashes)

Uploaded Jun 18, 2021 Python 3

Hashes for crowdnalysis-0.1.0.tar.gz

Hashes for crowdnalysis-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c5c96da6e62b2ab980e80811c03236feffbf589f91cd3b1690153d8c6beaa329`
MD5	`d59b026189a586630bc37866db7756c2`
BLAKE2b-256	`6b3fdfb382800bc6da32c1192aea62d4b685a0bac14a756bd10b336894b0447a`

Hashes for crowdnalysis-0.1.0-py3-none-any.whl

Hashes for crowdnalysis-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1ba71e41433cee522ce3655eb26729f89df11798cdb6c5afbe7578f1d8ecbcb`
MD5	`5540213a162c1da1dcc5bb3dedc18b8d`
BLAKE2b-256	`8ebcd68f2faab423762bdc63a9953916bf7e63cae1a7a2592d4a57cf88372022`