High-dimensional embedding generation library
Project description
HiDi is a library for high-dimensional embedding generation for collaborative filtering applications.
How Do I Use It?
This will get you started.
from hidi import inout, clean, matrix, pipeline
# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']
# File to write output data to
outfile = 'embeddings.csv'
transforms = [
inout.ReadTransform(in_files), # Read data from disk
clean.DedupeTransform(), # Dedupe it
matrix.SparseTransform(), # Make a sparse user*item matrix
matrix.SimilarityTransform(), # To item*item similarity matrix
matrix.SVDTransform(), # Perform SVD dimensionality reduction
matrix.ItemsMatrixToDFTransform(), # Make a DataFrame with an index
inout.WriteTransform(outfile) # Write results to csv
]
pl = pipeline.Pipeline(transforms)
pl.run()
Setup
Requirements
HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.
Installation
To install HiDi, simply run
$ pip install hidi
Run the Tests
$ pip install tox
$ tox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
HiDi-0.0.2.tar.gz
(7.0 kB
view hashes)