Skip to main content

SWeeP is a tool to representing large biological sequences datasets in compact vectors

Project description

This package is a python version of the tool described in the article available at <https://www.nature.com/articles/s41598-019-55627-4>. Please quote the article. Only amino acid sequence vectorization is currently available.

Use

To use SWeeP in python, install the package with the command “pip install sweep” and import the package in your code, as in the example:

from sweep import fastaread, fas2sweep
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta)

The output is the matrix already projected, with 600 columns. See the article if you need information about the projection method.

The default projection matrix has dimensions 160000x600. It is necessary to generate a new matrix in case other masks are used or another projection size is desired. To generate the orthonormal matrix for projection on the package, a function called orthbase is also available. For example, if the goal is to change the projection size to 300, just use:

from sweep import fastaread, fas2sweep, orthbase
ob = orthbase(160000,300)
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta, orthMat = ob)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sweep-1.0.0.1-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page