A spaCy custom component to extract structural information from text using the SpanRuler and regex patterns.
Project description
Span Extructure
You might think the name is mispelled but it ain't. It is a word play on spaCy's Span
, extract and structure. span_exctructure
is a spaCy component that builds upon SpanRuler
and regex to extract structured information, e.g. dates, amounts with currency and multipliers etc.
Installation
pip install span_extructure
Usage
import spacy
nlp = spacy.blank("en")
# Optionally add config if varying from default values
config = {
"overwrite": False, # default: False
"rules": [
{
"patterns": [[{"SHAPE": "dd.dd.dddd"}]],
"extruct": r"(?P<day>[0-3]\d).(?P<month>0[1-9]|1[0-2]).(?P<year>20[0-5]\d|19\d\d)",
"label": "DATE",
}
]
}
nlp.add_pipe("span_extructure", config=config)
doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e._.extructure}")
>>> 21.04.1986 DATE {'day': '21', 'month': '04', 'year': '1986'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
span-extructure-0.1.1.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for span_extructure-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74ae557d2a39b76ab3710a4b39765a4237a5874fee23e804a61352e5f6e558fd |
|
MD5 | cbdb63185683aa8b664cdb35a9c46e85 |
|
BLAKE2b-256 | 06cba3303408cc31fb49eefc4ae0b98c8373404478e0e7d0ed26a99db7f5ed00 |