Project description

phruzz-matcher

Combination of the RapidFuzz library with Spacy PhraseMatcher The goal of this component is to find matches when there were NO "perfect matches" due to typos or abbreviations between a Spacy doc and a list of phrases. To see more about Spacy Phrase Matcher go to https://spacy.io/usage/rule-based-matching#phrasematcher

Installation (dev)

    git clone https://github.com/mjvallone/phruzz_matcher_spacy.git

Configuration (dev)

Create virtualenv using python3 (follow https://virtualenvwrapper.readthedocs.io/en/latest/install.html)
```
 virtualenv venv
```
Activate the virtualenv
```
 . venv/bin/activate
```
Install requirements
```
 pip install -r requirements.txt
```

Usage

First you need to install it

pip install phruzz_matcher

If you want to add it to your pipeline you could do something like this:

from phruzz_matcher.phrase_matcher import PhruzzMatcher

@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
    return PhruzzMatcher(nlp, list_of_phrases, entity_label, match_percentage)


nlp.add_pipe("phrase_matcher")

Parameters

nlp: the Spacy model you use (it was tested with the different Spanish models from Spacy).
list_of_phrases: the list of phrases you want to find in the Spacy doc.
entity_label: when finding matches you need to specify which entity label will replace them in the Spacy doc.
match_percentage: percentage from the one you will keep matches between text from Spacy doc and the list of phrases. Higher the percentage, lower the differences "tolerated" to find a match.

Result

Based on Spacy documentation "A pipeline component is a function that receives a Doc object, modifies it and returns it", so the PhruzzMatcher returns a Doc object. For further information visit https://spacy.io/usage/processing-pipelines#custom-components

Example

import spacy
from spacy.language import Language
from phruzz_matcher.phrase_matcher import PhruzzMatcher

famous_people = [
        "Brad Pitt",
        "Demi Moore",
        "Bruce Willis",
        "Jim Carrey",
]

@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
    return PhruzzMatcher(nlp, famous_people, "FAMOUS_PEOPLE", 85)

nlp = spacy.blank("es")
nlp.add_pipe("phrase_matcher")

doc = nlp("El otro día fui a un bar donde vi a brad pit y a Demi Moore, estaban tomando unas cervezas mientras charlaban de sus asuntos.")
print(f"doc.ents: {doc.ents}")
#doc.ents: (brad pit, Demi Moore)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.7
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

This version

0.0.4

Sep 29, 2021

0.0.3

Sep 29, 2021

0.0.2

Sep 17, 2021

0.0.1

Sep 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

phruzz_matcher-0.0.4-py3-none-any.whl (4.5 kB view hashes)

Uploaded Sep 29, 2021 Python 3

Hashes for phruzz_matcher-0.0.4-py3-none-any.whl

Hashes for phruzz_matcher-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c413eac7a9afef74aaea00ea1fa471e495186a7f8bb1d51281607dcea29f54e`
MD5	`6ea82653b26b77df560b8cc3a8e434b7`
BLAKE2b-256	`84714c7f8fa9dbb0b169787f582eb9a2f25f3f58fe5bd2b536f36cdc8eb17802`