Combination of the RapidFuzz library with Spacy PhraseMatcher
Project description
phruzz-matcher
Combination of the RapidFuzz library with Spacy PhraseMatcher The goal of this component is to find matches when there were NO "perfect matches" due to typos or abbreviations between a Spacy doc and a list of phrases. To see more about Spacy Phrase Matcher go to https://spacy.io/usage/rule-based-matching#phrasematcher
Installation (dev)
git clone https://github.com/mjvallone/phruzz_matcher_spacy.git
Configuration (dev)
-
Create virtualenv using python3 (follow https://virtualenvwrapper.readthedocs.io/en/latest/install.html)
virtualenv venv
-
Activate the virtualenv
. venv/bin/activate
-
Install requirements
pip install -r requirements.txt
Usage
First you need to install it
pip install phruzz_matcher
If you want to add it to your pipeline you could do something like this:
from phruzz_matcher.phrase_matcher import PhruzzMatcher
@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
return PhruzzMatcher(nlp, list_of_phrases, entity_label, match_percentage)
nlp.add_pipe("phrase_matcher")
Parameters
nlp
: the Spacy model you use (it was tested with the different Spanish models from Spacy).list_of_phrases
: the list of phrases you want to find in the Spacy doc.entity_label
: when finding matches you need to specify which entity label will replace them in the Spacy doc.match_percentage
: percentage from the one you will keep matches between text from Spacy doc and the list of phrases. Higher the percentage, lower the differences "tolerated" to find a match.
Result
Based on Spacy documentation "A pipeline component is a function that receives a Doc object, modifies it and returns it", so the PhruzzMatcher returns a Doc object. For further information visit https://spacy.io/usage/processing-pipelines#custom-components
Example
import spacy
from spacy.language import Language
from phruzz_matcher.phrase_matcher import PhruzzMatcher
famous_people = [
"Brad Pitt",
"Demi Moore",
"Bruce Willis",
"Jim Carrey",
]
@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
return PhruzzMatcher(nlp, famous_people, "FAMOUS_PEOPLE", 85)
nlp = spacy.blank("es")
nlp.add_pipe("phrase_matcher")
doc = nlp("El otro día fui a un bar donde vi a brad pit y a Demi Moore, estaban tomando unas cervezas mientras charlaban de sus asuntos.")
print(f"doc.ents: {doc.ents}")
#doc.ents: (brad pit, Demi Moore)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for phruzz_matcher-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c413eac7a9afef74aaea00ea1fa471e495186a7f8bb1d51281607dcea29f54e |
|
MD5 | 6ea82653b26b77df560b8cc3a8e434b7 |
|
BLAKE2b-256 | 84714c7f8fa9dbb0b169787f582eb9a2f25f3f58fe5bd2b536f36cdc8eb17802 |