Skip to main content

A spaCy custom component that extends the SpanRuler with a second opinion

Project description

Second Opinion Ruler

second_opinion_ruler is a spaCy component that extends SpanRuler with a second opinion. For each pattern you can provide a callback (available in registry.misc) on the matched Span - with this you can decide to discard the match, add additional spans to the match and/or mutate the matched span, e.g. add a parsed datetime to a custom attribute.

Installation

pip install second_opinion_ruler

Usage

import spacy
from spacy.tokens import Span
from spacy.util import registry

# create date as custom attribute extension
Span.set_extension("date", default=None)

# add datetime parser to registry.misc
# IMPORTANT: first argument has to be Span and the return type has to be list[Span]
@registry.misc("to_datetime.v1")
def to_datetime(span: Span, format: str, attr: str = "date") -> list[Span]:

    # parse the date
    date = datetime.datetime.strptime(span.text, format)

    # add the parsed date to the custom attribute
    span._.set(attr, date)

    # just return matched span
    return [span]

# load a model
nlp = spacy.blank("en")

# add the second opinion ruler
ruler = nlp.add_pipe("second_opinion_ruler", config={
    "validate": True,
    "annotate_ents": True,
})

# add a pattern with a second opinion handler (on_match)
ruler.add_patterns([
    {
        "label": "DATE",
        "pattern": "21.04.1986",
        "on_match": {
            "id": "to_datetime.v1",
            "kwargs": {"format": "%d.%m.%Y", "attr": "my_date"},
        },
    }
])

doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")

# verify
assert doc.ents[0]._.date == datetime.datetime(1986, 4, 21)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

second-opinion-ruler-0.1.0.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

second_opinion_ruler-0.1.0-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page