Skip to main content

No project description provided

Project description

Mismo

PyPI - Version PyPI - Python Version


Table of Contents

Goals

Use Ibis as the core

This gives a few benefits that are key to record linkage:

  • Ability to use datasets that are larger than memory
  • Ability to use multiple backends (eg duckdb for single node, or bigquery or spark for distributed)

Thoughtful, composable API

Use a duck-typing approach to allow users to plug in their own components eg "Blocker" has a block method with a certain signature. This makes mismo a bit more complicated than dedupe or splink, but it will be much more flexible.

Extras

  • More ergonomic model persistence than dedupe. splink did a good job here.
  • Support determinism using random_state (unlike dedupe)

License

mismo is distributed under the terms of the LGPL-3.0-or-later license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mismo-0.1.0.tar.gz (24.8 kB view hashes)

Uploaded Source

Built Distribution

mismo-0.1.0-py3-none-any.whl (33.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page