polyfuzz

PolyFuzz performs fuzzy string matching, grouping, and evaluation.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.

Currently, methods include Levenshtein distance with RapidFuzz, a character-based n-gram TF-IDF, word embedding techniques such as FastText and GloVe, and finally ðŸ¤— transformers embeddings.

You can use your own custom models for both the fuzzy string matching as well as the string grouping.

Corresponding medium post can be found here.

Getting Started

Back to ToC

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF").match(from_list, to_list)

The resulting string matches can be accessed through model.get_matches():

>>> model.get_matches()
      From      To  Similarity
     apple   apple    1.000000
    apples  apples    1.000000
      appl   apple    0.783751
     recal    None    0.000000
     house   mouse    0.587927
similarity    None    0.000000

References

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.2

Sep 3, 2023

0.4.1

Sep 3, 2023

0.4.0

May 7, 2022

0.3.4

Nov 5, 2021

0.3.3

Jun 16, 2021

0.3.2

Jun 8, 2021

0.3.1

Jun 8, 2021

0.3.0

Apr 30, 2021

0.2.2

Dec 7, 2020

0.2.1

Nov 28, 2020

0.2.0

Nov 27, 2020

This version

0.0.1

Nov 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyfuzz-0.0.1.tar.gz (14.6 kB view hashes)

Uploaded Nov 24, 2020 Source

Built Distribution

polyfuzz-0.0.1-py2.py3-none-any.whl (21.4 kB view hashes)

Uploaded Nov 24, 2020 Python 2 Python 3

Hashes for polyfuzz-0.0.1.tar.gz

Hashes for polyfuzz-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1fa20151f0f5c62e2b3368eb42b89b61db1bd966bfb6bfd5728496f5989ba624`
MD5	`ef590df45da3a8637f9f70ac4599b9e4`
BLAKE2b-256	`f8db18c9df7923bf65628b82b2cc74739b3e6cc15656acc7f250198017b0b78b`

Hashes for polyfuzz-0.0.1-py2.py3-none-any.whl

Hashes for polyfuzz-0.0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`cccc4d587cc111cd8bb2b804d583772fbdcc29c732105871eb9d23815d520ea7`
MD5	`cd6cde4fbc5dc63d1aaf1094cc7237b0`
BLAKE2b-256	`70b939fc9efc204e4a3e122650f92ddc55913e1982f8f0bbac1cd172950216a3`