A spaCy custom component to extract structural information from text using the SpanRuler and regex patterns.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Span Extructure

You might think the name is mispelled but it ain't. It is a word play on spaCy's Span, extract and structure. span_exctructure is a spaCy component that builds upon SpanRuler and regex to extract structured information, e.g. dates, amounts with currency and multipliers etc.

Installation

pip install span_extructure

Usage

import spacy

nlp = spacy.blank("en")

# Optionally add config if varying from default values
config = {
    "overwrite": False,       # default: False
    "rules": [
        {
            "patterns": [[{"SHAPE": "dd.dd.dddd"}]],
            "extruct": r"(?P<day>[0-3]\d).(?P<month>0[1-9]|1[0-2]).(?P<year>20[0-5]\d|19\d\d)",
            "label": "DATE",
        }
    ]
}
nlp.add_pipe("span_extructure", config=config)

doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")
for e in doc.ents:
    print(f"{e.text}\t{e.label_}\t{e._.extructure}")

>>> 21.04.1986      DATE    {'day': '21', 'month': '04', 'year': '1986'}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.1

Oct 1, 2022

0.1.0

Oct 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span-extructure-0.1.1.tar.gz (4.1 kB view hashes)

Uploaded Oct 1, 2022 Source

Built Distribution

span_extructure-0.1.1-py3-none-any.whl (4.2 kB view hashes)

Uploaded Oct 1, 2022 Python 3

Hashes for span-extructure-0.1.1.tar.gz

Hashes for span-extructure-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b730a58fe0b4936c22f7d8f8b8cdd50aa66a4cd41283a561302b45e6828c30fc`
MD5	`36b8c5076d33bdfc88251aebb4f2648c`
BLAKE2b-256	`542b2b8e092b7f028ff8b3d1c26ddcaa86f3c9469b58736e0f1099a297f21ec5`

Hashes for span_extructure-0.1.1-py3-none-any.whl

Hashes for span_extructure-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74ae557d2a39b76ab3710a4b39765a4237a5874fee23e804a61352e5f6e558fd`
MD5	`cbdb63185683aa8b664cdb35a9c46e85`
BLAKE2b-256	`06cba3303408cc31fb49eefc4ae0b98c8373404478e0e7d0ed26a99db7f5ed00`