Computational Quality Control for Crowdsourcing

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
metrics of uncertainty, consistency, and agreement with aggregate
loaders for popular crowdsourced datasets

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit. If you also want to use the learning subpackage, type pip instal crowd-kit[learning].

Those who are interested in contributing to Crowd-Kit can use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the worker responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method	Status
Majority Vote	✅
One-coin Dawid-Skene	✅
Dawid-Skene	✅
Gold Majority Vote	✅
M-MSR	✅
Wawa	✅
Zero-Based Skill	✅
GLAD	✅
KOS	✅
MACE	✅
BCC	🟡

Multi-Label Responses

Method	Status
Binary Relevance	✅

Textual Responses

Method	Status
RASA	✅
HRRASA	✅
ROVER	✅
Language Model-Based	✅

Image Segmentation

Method	Status
Segmentation MV	✅
Segmentation RASA	✅
Segmentation EM	✅

Pairwise Comparisons

Method	Status
Bradley-Terry	✅
Noisy Bradley-Terry	✅

Learning from Crowds

Method	Status
CrowdLayer	✅
CoNAL	✅

Citation

Ustalov D., Pavlichenko N., Losev V., Giliazev I., and Tulin E. A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track. HCOMP 2021. 2021. arXiv: 2109.08584 [cs.HC].

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/2021/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

For reporting bugs please use the Toloka/bugreport page.
Join our English-speaking slack community for both tech and abstract questions.

License

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.3.0.post0

Apr 6, 2024

1.3.0

Apr 5, 2024

1.3.0rc2 pre-release

Apr 3, 2024

1.3.0rc1 pre-release

Apr 3, 2024

1.2.1

Mar 31, 2023

This version

1.2.0

Dec 14, 2022

1.2.0rc1 pre-release

Dec 13, 2022

1.1.0

Sep 27, 2022

1.1.0rc4 pre-release

Sep 26, 2022

1.1.0rc3 pre-release

Sep 23, 2022

1.1.0rc2 pre-release

Jul 28, 2022

1.0.0

Mar 22, 2022

0.0.9

Nov 30, 2021

0.0.8

Oct 14, 2021

0.0.7

Sep 2, 2021

0.0.6

Aug 18, 2021

0.0.5

Jul 18, 2021

0.0.4

May 19, 2021

0.0.3

Apr 12, 2021

0.0.2

Apr 7, 2021

0.0.1

Mar 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowd-kit-1.2.0.tar.gz (55.3 kB view hashes)

Uploaded Dec 14, 2022 Source

Built Distribution

crowd_kit-1.2.0-py3-none-any.whl (82.3 kB view hashes)

Uploaded Dec 14, 2022 Python 3

Hashes for crowd-kit-1.2.0.tar.gz

Hashes for crowd-kit-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b260c543ebedb1d679447d797628c219be20a25aa1535dcd3cd33c96a6b17af3`
MD5	`48e6106ce8474f4350372ac9c5f67b7e`
BLAKE2b-256	`0445eadbf8072b702aeeb1fa8b83a744f264b3e56b7ea4c600c30ef37a021d9c`

Hashes for crowd_kit-1.2.0-py3-none-any.whl

Hashes for crowd_kit-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a1a68e0d3910021220036a2092937fa7df98a09eac79276362b58bb3d35062b6`
MD5	`74d069fb53f0e214e89f3eb7e5575db9`
BLAKE2b-256	`5abcce07542010c9481dfc90b1f7aeca8a3695a2f18943d0fa1c69b6c5551760`