fact-checking

Check a claim consistency against the provided evidence

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Fact checking

This generative model - trained on FEVER - aims to predict whether a claim is consistent with the provided evidence.

Installation and simple usage

One quick way to install it is to type

pip install fact_checking

and then use the following code:

from transformers import (
    GPT2LMHeadModel,
    GPT2Tokenizer,
)

from fact_checking import FactChecker

_evidence = """
Justine Tanya Bateman (born February 19, 1966) is an American writer, producer, and actress . She is best known for her regular role as Mallory Keaton on the sitcom Family Ties (1982 -- 1989). Until recently, Bateman ran a production and consulting company, SECTION 5 . In the fall of 2012, she started studying computer science at UCLA.
"""

_claim = 'Justine Bateman is a poet.'

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
fact_checking_model = GPT2LMHeadModel.from_pretrained('fractalego/fact-checking')
fact_checker = FactChecker(fact_checking_model, tokenizer)
is_claim_true = fact_checker.validate(_evidence, _claim)

print(is_claim_true)

which gives the output

False

Probabilistic output with replicas

The output can include a probabilistic component, obtained by iterating a number of times the output generation. The system generates an ensemble of answers and groups them by Yes or No.

For example, one can ask

from transformers import (
    GPT2LMHeadModel,
    GPT2Tokenizer,
)

from fact_checking import FactChecker

_evidence = """
Jane writes code for Huggingface.
"""

_claim = 'Jane is an engineer.'


tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
fact_checking_model = GPT2LMHeadModel.from_pretrained('fractalego/fact-checking')
fact_checker = FactChecker(fact_checking_model, tokenizer)
is_claim_true = fact_checker.validate_with_replicas(_evidence, _claim)

print(is_claim_true)

with output

{'Y': 0.95, 'N': 0.05}

Score on FEVER

The predictions are evaluated on a subset of the FEVER dev dataset, restricted to the SUPPORTING and REFUTING options:

precision	recall	F1
0.94	0.98	0.96

These results should be taken with many grains of salt. This is still a work in progress, and there might be leakage coming from the underlining GPT2 model unnaturally raising the scores.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.3

Dec 12, 2021

0.0.2

Dec 6, 2021

0.0.1

Dec 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fact_checking-0.0.3.tar.gz (3.6 kB view hashes)

Uploaded Dec 12, 2021 Source

Built Distribution

fact_checking-0.0.3-py2-none-any.whl (5.0 kB view hashes)

Uploaded Dec 12, 2021 Python 2

Hashes for fact_checking-0.0.3.tar.gz

Hashes for fact_checking-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`cd7a6b8f112a419df30d5fd01d5c815d4cfbe817d57303cc10043a2ed218110b`
MD5	`630ec1fbbcec72018b7de133fa8cb032`
BLAKE2b-256	`c3edca12419db224518ab3ec6edf9c4c7c0867ea2323cf1387182aa567f33f72`

Hashes for fact_checking-0.0.3-py2-none-any.whl

Hashes for fact_checking-0.0.3-py2-none-any.whl
Algorithm	Hash digest
SHA256	`a934b8e1021c78a02cd700ae7fb3fb8dea71d44c5a1815f97075dc710e84bf63`
MD5	`c8e137f34e22dfbaffe813f01e1cd3c8`
BLAKE2b-256	`6ef5cac4b1e2b877589d2e5b872cba1c873c4ff62fbaa2d9bb2f1276467d2aec`