SSNMF contains class for (SS)NMF model and several multiplicative update methods to train different models.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SSNMF

SSNMF contains class for (SS)NMF model and several multiplicative update methods to train different models.

Installation

To install SSNMF, run this command in your terminal:

    $ pip install -U ssnmf

This is the preferred method to install SSNMF, as it will always install the most recent stable release.

If you don't have pip installed, these installation instructions can guide you through the process.

Usage

First, import the ssnmf package and the relevant class SSNMF. We import numpy and `scipy' for experimentation.

>>> import ssnmf
>>> from ssnmf import SSNMF
>>> import numpy as np
>>> import scipy
>>> import scipy.sparse as sparse
>>> import scipy.optimize

Training an unsupervised model

Declare an unsupervised NMF model with data matrix X and number of topics k.

>>> X = np.random.rand(100,100)
>>> k = 10
>>> model = SSNMF(X,k)

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')

Run the multiplicative updates method for this unsupervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F.

>>> N = 100
>>> model.mult(numiters = N)

This method updates the factor matrices N times. You can see how much the relative reconstruction error improves.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')

Training a supervised model

We begin by generating some synthetic data for testing.

>>> labelmat = np.concatenate((np.concatenate((np.ones([1,10]),np.zeros([1,30])),axis=1),np.concatenate((np.zeros([1,10]),np.ones([1,10]),np.zeros([1,20])),axis=1),np.concatenate((np.zeros([1,20]),np.ones([1,10]),np.zeros([1,10])),axis=1),np.concatenate((np.zeros([1,30]),np.ones([1,10])),axis=1)))
>>> B = sparse.random(4,10,density=0.2).toarray()
>>> S = np.zeros([10,40])
>>> for i in range(40):
	S[:,i] = scipy.optimize.nnls(B,labelmat[:,i])[0]
>>> A = np.random.rand(40,10)
>>> X = A @ S

Declare a supervised NMF model with data matrix X, number of topics k, label matrix Y, and weight parameter lam.

>>> k = 10
>>> model = SSNMF(X,k,Y = labelmat,lam=100*np.linalg.norm(X,'fro'))

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F and classification accuracy.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')
>>> acc = model.accuracy()

Run the multiplicative updates method for this supervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F^2 + lam ||Y - BS||_F^2. This also saves the errors and accuracies in each iteration.

>>> N = 100
>>> [errs,reconerrs,classerrs,classaccs] = model.snmfmult(numiters = N,saveerrs = True)

This method updates the factor matrices N times. You can see how much the relative reconstruction error and classification accuracy improves.

>>> rel_error = reconerrs[99]/np.linalg.norm(X,'fro')
>>> acc = classaccs[99]

Training a supervised model with KL-divergence

We begin by generating some synthetic data for testing.

>>> labelmat = np.concatenate((np.concatenate((np.ones([1,10]),np.zeros([1,30])),axis=1),np.concatenate((np.zeros([1,10]),np.ones([1,10]),np.zeros([1,20])),axis=1),np.concatenate((np.zeros([1,20]),np.ones([1,10]),np.zeros([1,10])),axis=1),np.concatenate((np.zeros([1,30]),np.ones([1,10])),axis=1)))
>>> B = sparse.random(4,10,density=0.2).toarray()
>>> S = np.zeros([10,40])
>>> for i in range(40):
	S[:,i] = scipy.optimize.nnls(B,labelmat[:,i])[0]
>>> A = np.random.rand(40,10)
>>> X = A @ S

Declare a supervised NMF model with data matrix X, number of topics k, label matrix Y, and weight parameter lam.

>>> k = 10
>>> model = SSNMF(X,k,Y = labelmat,lam=100*np.linalg.norm(X,'fro'))

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F, classification accuracy, and KL-divergence improves.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')
>>> acc = model.accuracy()
>>> div = model.kldiv()

Run the multiplicative updates method for this supervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F^2 + lam D(Y||BS). This also saves the errors and accuracies in each iteration.

>>> N = 100
>>> [errs,reconerrs,classerrs,classaccs] = model.klsnmfmult(numiters = N,saveerrs = True)

This method updates the factor matrices N times. You can see how much the relative reconstruction error and classification accuracy improves.

>>> rel_error = reconerrs[99]/np.linalg.norm(X,'fro')
>>> acc = classaccs[99]
>>> div = classerrs[99]

Citing

If you use our code in an academic setting, please consider citing our code.

Development

See CONTRIBUTING.md for information related to developing the code.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.3

Oct 15, 2020

1.0.2

Oct 13, 2020

1.0.1

Oct 12, 2020

1.0.0

Oct 12, 2020

This version

0.0.2

May 14, 2020

0.0.1

May 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssnmf-0.0.2.tar.gz (8.8 kB view hashes)

Uploaded May 14, 2020 Source

Built Distribution

ssnmf-0.0.2-py2.py3-none-any.whl (6.9 kB view hashes)

Uploaded May 14, 2020 Python 2 Python 3

Hashes for ssnmf-0.0.2.tar.gz

Hashes for ssnmf-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d328ce42587a198909299fa9bc1061258f14af97ba528b883975a7293bd4128e`
MD5	`b8bcdd01b645e695c64eccf42acd978b`
BLAKE2b-256	`b5d1c43d691f08a37b2bc3f737cbb89ea74ba881ac78867f0d8774c5e1c62122`

Hashes for ssnmf-0.0.2-py2.py3-none-any.whl

Hashes for ssnmf-0.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`14c0773433fc8acc83c2f9cf008621662bc1373c0c549508c648eb00c88622ca`
MD5	`0d05084d418e85930612e44f37284056`
BLAKE2b-256	`b794c39afba5da84b88ae57bd863133555304e8c9d19bc0ed965f3428189d289`