Skip to main content

A dict with a vector index for fast lookup of nearest neighbors

Project description

vdict

GitHub tests PyPI version MIT license

This a very thin wrapper around hnswlib to make it look like a python dictionary whose keys are numpy arrays. Install with pip install vdict.

from vdict import vdict
import numpy as np

data = vdict()
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
data[v2] = 32
assert data[v1] == 'hello'

You can have it throw IndexErrors if you try to access a key that doesn't exist:

data = vdict(tol=0.001)
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
# this will throw an IndexError because we didn't add yet!
print(data[v2])

The default tolerance is 1 (generally do not throw errors), but you can set it to a smaller value to make it more strict.

Details

  • All vectors must be the same length
  • Accessing with a vector gives the closest value keyed by the closest vector
  • The algorithm is approximate nearest neighbor search. You can tune the accuracy (see below)
  • You can have millions of vectors in the dictionary
  • If you know the approximate size, pass est_nelements to vidct() to reduce how often things are resized

Usage

The vdict class has some reasonable defaults, but you may need to tune for your use case. These are adjustable in the constructor. You can read about the parameters at the hnswlib. Briefly, the most important ones are:

  • M - the number of neighbors to consider when building the graph (higher M means more accurate, but more memory). 12-48 is typical.
  • space - the distance metric to use. The default is l2, but you can also use cosine or ip (inner product).
  • ef_construction - parameter that controls speed/accuracy trade-off during the index construction - 50 - 200 is typical.
from vdict import vdict
data = vdict(M=16, space='cosine', ef_construction=100)

# add some vectors
data[np.random.rand(32)] = 'hello'
data[np.random.rand(32)] = 'world'

License

MIT

Author

Andrew White

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdict-0.1.0.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

vdict-0.1.0-py3-none-any.whl (4.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page