honest

...

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

HONEST: Measuring Hurtful Sentence Completion in Language Models

https://img.shields.io/pypi/v/honest.svg

https://img.shields.io/travis/MilaNLProc/honest.svg

…

Large language models (LLMs) have revolutionized the field of NLP. However, LLMs capture and proliferate hurtful stereotypes, especially in text generation. We propose HONEST, a score to measure hurtful sentence completions in language models. It uses a systematic template- and lexicon-based bias evaluation methodology in six languages (English, Italian, French, Portuguese, Romanian, and Spanish) for binary gender and in English for LGBTQAI+ individuals.

…

See the papers for additional details:

Nozza D., Bianchi F., and Hovy D. “HONEST: Measuring hurtful sentence completion in language models.” The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021. https://aclanthology.org/2021.naacl-main.191

Nozza D., Bianchi F., Lauscher L., and Hovy D. “Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals.” The Second Workshop on Language Technology for Equality, Diversity and Inclusion at the Annual Meeting of the Association for Computational Linguistics 2022. https://milanlproc.github.io/publication/2022-honest-hurtful-language-model-lgbtqia+/

Tutorials

Name	Link
Compute HONEST score (+Viz) (stable v0.2.0)

Installing

pip install -U honest

Using

# Load HONEST templates
evaluator = honest.HonestEvaluator(lang)
masked_templates = evaluator.templates(data_set="binary") # or "queer_nonqueer" or "all"

# Load BERT model
tokenizer = AutoTokenizer.from_pretrained(name_model)
model = AutoModelForMaskedLM.from_pretrained(name_model)

# Define nlp_fill pipeline
nlp_fill = pipeline('fill-mask', model=model, tokenizer=tokenizer, top_k=k)

print("FILL EXAMPLE:",nlp_fill('all women likes to [M].'.replace('[M]',tokenizer.mask_token)))

# Fill templates (please check if the filled words contain any special character)
filled_templates = [[fill['token_str'].strip() for fill in nlp_fill(masked_sentence.replace('[M]',tokenizer.mask_token))] for masked_sentence in masked_templates.keys()]

honest_score = evaluator.honest(filled_templates)
print(name_model, k, honest_score)

Citation

Please use the following bibtex entries if you use this score in your project:

@inproceedings{nozza-etal-2021-honest,
title = {"{HONEST}: Measuring Hurtful Sentence Completion in Language Models"},
author = "Nozza, Debora and Bianchi, Federico  and Hovy, Dirk",
booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.naacl-main.191",
doi = "10.18653/v1/2021.naacl-main.191",
pages = "2398--2406",
}

@inproceedings{nozza-etal-2022-measuring,
    title = {Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals},
    author = "Nozza, Debora and Bianchi, Federico and Lauscher, Anne and Hovy, Dirk",
    booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",
    publisher = "Association for Computational Linguistics",
    year={2022}
}

Development Team

Federico Bianchi <f.bianchi@unibocconi.it> Bocconi University
Debora Nozza <debora.nozza@unibocconi.it> Bocconi University
Dirk Hovy <dirk.hovy@unibocconi.it> Bocconi University

Software Details

Free software: MIT license
Documentation: https://honest.readthedocs.io.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Note

Remember that this is a research tool :)

History

0.1.0 (2022-01-25)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.2.1

May 6, 2022

0.2.0

May 3, 2022

0.1.2

Apr 25, 2022

0.1.0

Feb 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

honest-0.2.1.tar.gz (8.0 kB view hashes)

Uploaded May 6, 2022 Source

Built Distribution

honest-0.2.1-py2.py3-none-any.whl (6.2 kB view hashes)

Uploaded May 6, 2022 Python 2 Python 3

Hashes for honest-0.2.1.tar.gz

Hashes for honest-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`06cf98505d71b0096ea6f59a8cfd56d8bbe38d6ec139b3d82498f780c03d2a0b`
MD5	`189723503eb22946d572ba02e350af82`
BLAKE2b-256	`97fae89b13a568899fe4e84a7198510b9a89da73187379dbbed304f0c65ace6d`

Hashes for honest-0.2.1-py2.py3-none-any.whl

Hashes for honest-0.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`44d934e495f7bdacf3a8e84434f6b17f3eae188dfff2f4f036db8e3930212415`
MD5	`3c78d6cd31c9cebd7f92ec8cb801b8f1`
BLAKE2b-256	`85b04612f975924aa0539c41a09e3ad9fcb42ae17ca143831c64b3e6f9d9d3d2`