Python libraries for crowdsourcing
Project description
Crowd-Kit: Computational Quality Control for Crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
- metrics of uncertainty, consistency, and agreement with aggregate
- loaders for popular crowdsourced datasets
The library is currently in a heavy development state, and interfaces are subject to change.
Installing
Installing Crowd-Kit is as easy as pip install crowd-kit
Getting Started
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task
, performer
, label
. Alternatively, you can download an example dataset.
df = pd.read_csv('results.csv') # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then you can aggregate the performer responses as easily as in scikit-learn:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Implemented Aggregation Methods
Categorical Responses
Method | Status |
---|---|
Majority Vote | ✅ |
Dawid-Skene | ✅ |
Gold Majority Vote | ✅ |
M-MSR | ✅ |
Wawa | ✅ |
Zero-Based Skill | ✅ |
GLAD | 🟡 |
BCC | 🟡 |
Textual Responses
Method | Status |
---|---|
RASA | ✅ |
HRRASA | ✅ |
ROVER | 🟡 |
Image Segmentation
Method | Status |
---|---|
Segmentation MV | ✅ |
Segmentation RASA | 🟡 |
Segmentation EM | 🟡 |
Pairwise Comparisons
Method | Status |
---|---|
Bradley-Terry | ✅ |
Noisy Bradley-Terry | ✅ |
Questions and Bug Reports
For reporting bugs please use the Toloka/bugreport page.
License
© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for crowd_kit-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9f2971e2b55f6b379cd5d2b61ba0a2c9f668c22f1e0c1fba9d4a165731dd614 |
|
MD5 | d0ddae2f80aa974909037cf9195fa553 |
|
BLAKE2b-256 | 473b77ebf674a76c441f4d96e8f4f74892b177e06b95291afe3a8b0a4d74b24c |