Project description

phruzz-matcher

Combination of the RapidFuzz library with Spacy PhraseMatcher The goal of this component is to find matches when there were NO "perfect matches" due to typos or abbreviations between a Spacy doc and a list of phrases. To see more about Spacy Phrase Matcher go to https://spacy.io/usage/rule-based-matching#phrasematcher

Installation (dev)

    git clone https://github.com/mjvallone/phruzz_matcher_spacy.git

Configuration (dev)

Create virtualenv using python3 (follow https://virtualenvwrapper.readthedocs.io/en/latest/install.html)
```
 virtualenv venv
```
Activate the virtualenv
```
 . venv/bin/activate
```
Install requirements
```
 pip install -r requirements.txt
```

Usage

First you need to install it

pip install phruzz_matcher

If you want to add it to your pipeline you could do something like this:

from phruzz_matcher.phrase_matcher import PhruzzMatcher

@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
    return PhruzzMatcher(nlp, list_of_phrases, entity_label, match_percentage)


nlp.add_pipe("phrase_matcher")

Parameters

nlp: the Spacy model you use (it was tested with the different Spanish models from Spacy).
list_of_phrases: the list of phrases you want to find in the Spacy doc.
entity_label: when finding matches you need to specify which entity label will replace them in the Spacy doc.
match_percentage: percentage from the one you will keep matches between text from Spacy doc and the list of phrases. Higher the percentage, lower the differences "tolerated" to find a match.

Result

Based on Spacy documentation "A pipeline component is a function that receives a Doc object, modifies it and returns it", so the PhruzzMatcher returns a Doc object. For further information visit https://spacy.io/usage/processing-pipelines#custom-components

Example

nlp = spacy.load("es_core_news_lg")
famous_people = [
        "Brad Pitt",
        "Demi Moore",
        "Bruce Willis",
        "Jim Carrey",
]

@Language.factory("phrase_matcher")
def phrase_matcher(nlp: Language, name: str):
    return PhruzzMatcher(nlp, famous_people, "FAMOUS_PEOPLE", 92)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.4

Sep 29, 2021

0.0.3

Sep 29, 2021

This version

0.0.2

Sep 17, 2021

0.0.1

Sep 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phruzz_matcher-0.0.2.tar.gz (3.5 kB view hashes)

Uploaded Sep 17, 2021 Source

Built Distribution

phruzz_matcher-0.0.2-py3-none-any.whl (4.3 kB view hashes)

Uploaded Sep 17, 2021 Python 3

Hashes for phruzz_matcher-0.0.2.tar.gz

Hashes for phruzz_matcher-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f3284447317908ea81cfbd554b2f29fbb3e5aafc2de6875e302cefaccf30ab29`
MD5	`0387000bac117d0112c9d20fe6084527`
BLAKE2b-256	`669e681769b72891b76bc13680dfec5092399df179a5b364a674d347448153c5`

Hashes for phruzz_matcher-0.0.2-py3-none-any.whl

Hashes for phruzz_matcher-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b19c89679bfa73cd9bcb498de8f19a895a7db8ac8efadf37a142be98ce7bdb0b`
MD5	`8326d4e75768f6fdf955a6e6fecb179b`
BLAKE2b-256	`c3eb16b11b85d58d6ec82d08ce7ebb4ad1b83583e6dbeccf2d420b5184fc074f`