Skip to main content

A package for storing and querying knowledge graph embeddings

Project description

This package provides a database schema and Python wrapper for storing the embeddings generated through various representation learning packages.

Currently, this package focuses on using a SQL database with SQLAlchemy, but might be extended to use a NoSQL database as an alternative.

Installation

Install embeddingdb directly from GitHub with:

$ pip install git+https://github.com/cthoyt/embeddingdb

Set the environment variable EMBEDDINGDB_CONNECTION to a valid SQLAlchemy connection string for a PostgreSQL instance, as this package uses the PostgreSQL-specific ARRAY type.

Command Line Interface

This package installs an entrypoint embeddingdb that can be used directly from the shell.

Uploading Entity Embeddings

Entities can be embedded and stored from various types of representation learning, including network representation learning, knowledge graph embedding, and textual learning.

Upload embeddings generated by word2vec by specifying the file path with:

$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt

Upload embeddings generated by pykeen by specifying the output directory with:

$ embeddingdb upload --fmt keen --path ~/path/to/directory/

Listing Entity Embeddings

After uploading, the collections can be listed with:

$ embeddingdb ls

Analyzing Entity Embeddings’ Correlations

One of the motivations for building this repository was to make a convenient way to compare the embeddings for entities generated through orthogonal embedding tecnhiques. For example, we wanted to know to what extent the embeddings for proteins generated from their sequences with ratvec contained the same information as the embeddings generated from protein-protein interaction networks with pykeen or nrl.

The two positional arguments correspond to the collection identifiers in the database.

$ embeddingdb analyze 1 2

Running with Docker

After installing Docker, the entire web application can be instantiated with:

$ docker-compose up

Get the endpoint /test to instantiate the database and add a test collection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingdb-0.0.1.tar.gz (11.3 kB view hashes)

Uploaded Source

Built Distribution

embeddingdb-0.0.1-py3-none-any.whl (14.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page