Skip to main content

Make record linkages in followthemoney data.

Project description

nomenklatura

Nomenklatura de-duplicates and integrates different Follow the Money entities. It serves to clean up messy data and to find links between different datasets.

Design

This package will offer an implementation of an in-memory data deduplication framework centered around the FtM data model. The idea is the following workflow:

  • Accept FtM-shaped entities from a given loader (e.g. a JSON file, or a database)
  • Build an in-memory inverted index of the entities for blocking
  • Generate merge candidates using the blocking index and FtM compare
  • Provide a file-based storage format for merge challenges and decisions
  • Provide a text-based user interface to let users make merge decisions

Later on, the following might be added:

  • A web application to let users make merge decisions on the web
  • An implementation of the OpenRefine Reconciliation API based on the blocking index

This will be done in typed Python 3.

Reading

Contact, contributions etc.

This codebase is licensed under the terms of an MIT license (see LICENSE).

We're keen for any contributions, bug fixes and feature suggestions, please use the GitHub issue tracker for this repository.

Nomenklatura is currently developed thanks to a Prototypefund grant for OpenSanctions. Previous iterations of the package were developed with support from Knight-Mozilla OpenNews and the Open Knowledge Foundation Labs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomenklatura-0.1.0.tar.gz (13.6 kB view hashes)

Uploaded Source

Built Distribution

nomenklatura-0.1.0-py3-none-any.whl (16.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page