SWeeP is a tool to representing large biological sequences datasets in compact vectors
Project description
This package is a python version of the tool described in the article available at <https://www.nature.com/articles/s41598-019-55627-4>. Please quote the article.
Use
To use SWeeP in python, install the package with the command “pip install sweep” and import the package in your code, as in the example:
from sweep import fastaread, fas2sweep
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta)
The default configurations are intended for vectorization of amino acid sequences. The default output is the matrix already projected, with 600 columns. See the article if you need information about the projection method.
The default projection matrix has dimensions 160000x600. It is necessary generate a new matrix if other masks are used or another projection size is desired. To generate the orthonormal matrix for projection, a function called orthbase is available on the package. For example, if the goal is to change the projection size to 300, just use:
from sweep import fastaread, fas2sweep, orthbase
ob = orthbase(160000,300)
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta, orth_mat = ob)
It is also possible obtain the result without projection, for this is necessary set the parameter “projection” to “False”.
For the nucleotide sequences vectorization is possible set the parameter fasta_type to “NT”.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.