A spaCy pipeline object for negation.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

negspacy: negation for spaCy

spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.

NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029

Installation and usage

Install the library.

pip install negspacy

Import library and spaCy.

import spacy
from negspacy.negation import Negex

Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON","ORG"])
nlp.add_pipe(negex, last=True)

View negations.

doc = nlp("She does not like Steve Jobs but likes Apple products.")

for e in doc.ents:
	print(e.text, e._.negex)

Steve Jobs True
Apple False

Consider pairing with scispacy to find UMLS concepts in text and process negations.

NegEx Patterns

psuedo_negations - phrases that are false triggers, ambiguous negations, or double negatives
preceding_negations - negation phrases that precede an entity
following_negations - negation phrases that follow an entity
termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Termsets

Designate termset to use, en_clinical is used by default.

negex = Negex(nlp, language = "en_clinical")

en = phrases for general english language text
en_clinical DEFAULT = adds phrases specific to clinical domain to general english
en_clinical_sensitive = adds additional phrases to help rule out historical and possibly irrelevant entities

Additional Functionality

Use own patterns or view patterns in use

Use own patterns

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, termination=["but", "however", "nevertheless", "except"])

View patterns in use

patterns_dict = negex.get_patterns

Negations in noun chunks

Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example when using scispacy:

nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text)

# no headache

This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a chunk_prefix:

nlp = spacy.load("en_core_sci_sm")
negex = Negex(nlp, language = "en_clinical", chunk_prefix = ["no"])
nlp.add_pipe(negex)
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text, e._.negex)

# no headache True

Contributing

contributing

Authors

Jeno Pizarro

License

license

API Documentation

Docs

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.4

May 20, 2023

1.0.3

May 25, 2022

1.0.2

Jan 20, 2022

1.0.1

Oct 25, 2021

1.0.0

Feb 22, 2021

0.1.9

Nov 18, 2020

0.1.8

Oct 1, 2020

This version

0.1.7

Mar 13, 2020

0.1.6

Nov 23, 2019

0.1.5

Oct 16, 2019

0.1.4

Sep 25, 2019

0.1.2

Aug 18, 2019

0.1.1

Aug 16, 2019

0.1.0a0 pre-release

Aug 14, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

negspacy-0.1.7.tar.gz (7.8 kB view hashes)

Uploaded Mar 13, 2020 Source

Hashes for negspacy-0.1.7.tar.gz

Hashes for negspacy-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`74b930b2aa5d834a6e9496e96abc025a00c4ae292e861899258924a24e25538d`
MD5	`c0c4762f4c0c19709934691f0bd27635`
BLAKE2b-256	`b7690c8f46cef8d8b6ee8925270e2d48c7ebd93153dcfbd28db778eaf3588f3f`