PolyFuzz performs fuzzy string matching, grouping, and evaluation.
Project description
PolyFuzz
performs fuzzy string matching, string grouping, and contains extensive evaluation functions.
PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
Currently, methods include Levenshtein distance with RapidFuzz, a character-based n-gram TF-IDF, word embedding techniques such as FastText and GloVe, and finally 🤗 transformers embeddings.
You can use your own custom models for both the fuzzy string matching as well as the string grouping.
Corresponding medium post can be found here.
Getting Started
from polyfuzz import PolyFuzz
from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]
model = PolyFuzz("TF-IDF").match(from_list, to_list)
The resulting string matches can be accessed through model.get_matches()
:
>>> model.get_matches()
From To Similarity
apple apple 1.000000
apples apples 1.000000
appl apple 0.783751
recal None 0.000000
house mouse 0.587927
similarity None 0.000000
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for polyfuzz-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cccc4d587cc111cd8bb2b804d583772fbdcc29c732105871eb9d23815d520ea7 |
|
MD5 | cd6cde4fbc5dc63d1aaf1094cc7237b0 |
|
BLAKE2b-256 | 70b939fc9efc204e4a3e122650f92ddc55913e1982f8f0bbac1cd172950216a3 |