Skip to main content

Annotation error detection and correction

Project description

Nessie Logo

Documentation Status PyPI PyPI - License Code style: black

nessie is a package for annotation error detection. It can be used to automatically detect errors in annotated corpora so that human annotators can concentrate on a subset to correct, instead of needing to look at each and every instance.

💡 Please also refer to our additional documentation ! It contains detailed explanations and code examples!

Contact person: Jan-Christoph Klie
https://www.ukp.tu-darmstadt.de
https://www.tu-darmstadt.de

Don't hesitate to report an issue if something is broken (and it shouldn't be) or if you have further questions.

⚠️ This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Please use the following citation when using our software:

@misc{https://doi.org/10.48550/arxiv.2206.02280,
  doi = {10.48550/ARXIV.2206.02280},  
  url = {https://arxiv.org/abs/2206.02280},  
  author = {Klie, Jan-Christoph and Webber, Bonnie and Gurevych, Iryna},  
  title = {Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future},  
  publisher = {arXiv},  
  year = {2022}
}

Installation

pip install nessie

This installs the package with default dependencies and PyTorch with only CPU support. If you want to use your own PyTorch version (e.g., with CUDA enabled), you need to install it afterwards manually. If you need faiss-gpu, then you should also install that manually afterwards.

Basic Usage

Given annotated data, this package can be used to find potential errors. For instance, using Retag, that is, training a model, letting it predict on your data and then flagging instances where model predictions disagree with the given labels can be done as:

from nessie.dataloader import load_example_text_classification_data
from nessie.helper import CrossValidationHelper
from nessie.models.text import DummyTextClassifier
from nessie.detectors import Retag

text_data = load_example_text_classification_data().subset(100)

cv = CrossValidationHelper(n_splits=10)
tc_result = cv.run(text_data.texts, text_data.noisy_labels, DummyTextClassifier())

detector = Retag()

flags = detector.score(text_data.noisy_labels, tc_result.predictions)

💡 Please also refer to our additional documentation ! It contains detailed explanations and code examples!

Methods

We implement a wide range of annotation error detection methods. These are divided in two categories, flaggers and scorers. Flaggers give a binary judgement whether an instance is considered wrong, Scorers give a certainty estimate how likely it is that an instance is wrong.

Flagger

Abbreviation Method Text Token Span Proposed by
CL Confident Learning Northcutt (2021)
CS Curriculum Spotter Amiri (2018)
DE Diverse Ensemble Loftsson (2009)
IRT Item Response Theory Rodriguez (2021)
LA Label Aggregation Amiri (2018)
LS Leitner Spotter Amiri (2018)
PE Projection Ensemble Reiss (2020)
RE Retag van Halteren (2000)
VN Variation n-Grams Dickinson (2003)

Scorer

Abbreviation Method Text Token Span Proposed by
BC Borda Count Larson (2020)
CU Classification Uncertainty Hendrycks (2017)
DM Data Map Confidence Swayamdipta (2020)
DU Dropout Uncertainty Amiri (2018)
KNN k-Nearest Neighbor Entropy Grivas (2020)
LE Label Entropy Hollenstein (2016)
MD Mean Distance Larson (2019)
PM Prediction Margin Dligach (2011)
WD Weighted Discrepancy Hollenstein (2016)

Models

Model-based annotation detection methods need trained models to obtain predictions or probabilities. We already implemented the most common models for you to be ready to use. You can add your own models by implementing the respective abstract class for TextClassifier or SequenceTagger. We provide the following models:

Text classification

Class name Description
FastTextTextClassifier Fasttext
FlairTextClassifier Flair
LgbmTextClassifier LightGBM with handcrafted features
LgbmTextClassifier LightGBM with S-BERT features
MaxEntTextClassifier Logistic Regression with handcrafted features
MaxEntTextClassifier Logistic with S-BERT features
TransformerTextClassifier Transformers

You can easily add your own sklearn classifiers by subclassing SklearnTextClassifier like the following:

class MaxEntTextClassifier(SklearnTextClassifier):
    def __init__(self, embedder: SentenceEmbedder, max_iter=10000):
        super().__init__(lambda: LogisticRegression(max_iter=max_iter, random_state=RANDOM_STATE), embedder)

Sequence Classification

Class name Description
FlairSequenceTagger Flair
CrfSequenceTagger CRF with handcrafted features
MaxEntSequenceTagger Maxent sequence tagger
TransformerSequenceTagger Transformer

Development

We use flit for dependency management and packaging. Follow their documentation to install it. Then you can run

flit install -s

to download the dependencies and install in its own environment. In order to install your own PyTorch with CUDA, you can run

make force-cuda113

or install it manually in the poetry environment. You can format the code via

make format

which should be run before every commit.

Bibliography

Amiri, Hadi, Timothy Miller, and Guergana Savova. 2018. "Spotting Spurious Data with Neural Networks." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2006-16. New Orleans, Louisiana.

Dligach, Dmitriy, and Martha Palmer. 2011. "Reducing the Need for Double Annotation." Proceedings of the 5th Linguistic Annotation Workshop, 65-73. Portland, Oregon, USA.

Grivas, Andreas, Beatrice Alex, Claire Grover, Richard Tobin, and William Whiteley. 2020. "Not a Cute Stroke: Analysis of Rule- and Neural Network-based Information Extraction Systems for Brain Radiology Reports." Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, 24-37. Online.

Hendrycks, Dan, and Kevin Gimpel. 2017. "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks." Proceedings of International Conference on Learning Representations, 1-12.

Hollenstein, Nora, Nathan Schneider, and Bonnie Webber. 2016. "Inconsistency Detection in Semantic Annotation." Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 3986-90. Portorož, Slovenia.

Larson, Stefan, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, and Jason Mars. 2019. "Outlier Detection for Improved Data Quality and Diversity in Dialog Systems." Proceedings of the 2019 Conference of the North, 517-27. Minneapolis, Minnesota.

Loftsson, Hrafn. 2009. "Correcting a POS-Tagged Corpus Using Three Complementary Methods." Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), 523-31. Athens, Greece.

Northcutt, Curtis, Lu Jiang, and Isaac Chuang. 2021. "Confident Learning: Estimating Uncertainty in Dataset Labels." Journal of Artificial Intelligence Research 70 (April): 1373-1411.

Reiss, Frederick, Hong Xu, Bryan Cutler, Karthik Muthuraman, and Zachary Eichenberger. 2020. "Identifying Incorrect Labels in the CoNLL-2003 Corpus." Proceedings of the 24th Conference on Computational Natural Language Learning, 215-26. Online.

Rodriguez, Pedro, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber. 2021. "Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?" Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4486-4503. Online.

Swayamdipta, Swabha, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. "Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9275-93. Online.

van Halteren, Hans. 2000. "The Detection of Inconsistency in Manually Tagged Text." Proceedings of the COLING-2000 Workshop on Linguistically Interpreted Corpora, 48-55. Luxembourg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nessie-0.1.1.tar.gz (198.6 kB view hashes)

Uploaded Source

Built Distribution

nessie-0.1.1-py3-none-any.whl (76.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page