Skip to main content

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Project description

GISMO

https://img.shields.io/pypi/v/gismo.svg https://img.shields.io/travis/balouf/gismo.svg Documentation Status Code Coverage

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Gismo stands for Generic Information Search… with a Mind of its Own.

Features

Gismo combines three main ideas:

  • TF-IDTF: a symmetric version of the TF-IDF embedding.

  • DI-Iteration: a fast, push-based, variant of the PageRank algorithm.

  • Fuzzy dendrogram: a variant of the Louvain clustering algorithm.

Quickstart

Install gismo:

$ pip install gismo

Import gismo in a Python project:

import gismo as gs

Credits

Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong.

This package was created with Cookiecutter and the francois-durand/package_helper project template.

History

0.3.0 (2020-05-13)

  • dblp module: url2source function added to directly load a small dblp source in memory instead of using a FileSource approach.

  • Possibility to disable query distortion in gismo.

  • XGismo class to cross analyze embeddings.

  • Tutorials updated

0.2.5 (2020-05-11)

  • auto_k feature: if not specified, a query-dependent, reasonable, number of results k is estimated.

  • covering methods added to gismo. It is now possible to use get_covering_* instead of get_ranked_* to maximize coverage and/or eliminate redundancy.

0.2.4 (2020-05-07)

  • Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
    • Toy model, to get the hang of Gismo on a tiny example,

    • ACM, to play with Gismo on a small example,

    • DBLP, to play with a large dataset.

0.2.3 (2020-05-04)

  • ACM and DBLP dataset creation added.

0.2.2 (2020-05-04)

  • Notebook tutorials added (early version)

0.2.1 (2020-05-03)

  • Actual code

  • Coverage badge

0.1.0 (2020-04-30)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gismo-0.3.0.tar.gz (30.9 kB view hashes)

Uploaded Source

Built Distribution

gismo-0.3.0-py2.py3-none-any.whl (29.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page