automlsa2

Automated Multi-Locus Sequence Analysis tool

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- Free for non-commercial use
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Installation

automlsa2 is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows (untested) and supports Python 3.7+ and PyPy.

$ pip install --upgrade automlsa2

Dependencies

Python modules:

pandas
numpy
biopython
tqdm

See requirements.txt for more info.

External programs:

You can install external programs using the automlsa2 --install_deps command. These will be installed to ${HOME}/.local/external unless otherwise specified.

Just tell me how to run it

$ automlsa2 --files Genus_species_1.fna Genus_species_2.fna ... \
  Genus_species_N.fna --query queries.fasta -t THREADS -- runID

Alternatively:

$ automlsa2 --dir path/to/genomes --query queries.fasta -t THREADS \
  -- runID

Overview

automlsa2 is a re-imagination of autoMLSA.pl

The entire codebase has been re-written in python. While the general algorithm produces similar output, and several steps are shared, there are many updates and differences between the two programs, which will be covered later.

The general overview can be summarized here:

Input is a set of marker genes as queries, and a set of target genome FASTA files.
BLAST databases are generated for each target genome, and each query gene is extracted from the input query FASTA files.
BLAST searches are done with the extracted sequences and genomes.
Per genome hits are calculated pending the cut-offs, and genomes are filtered from the analysis.
Sequences are extracted from the BLAST results as unaligned multi-FASTAs.
Unaligned sequences are aligned using mafft.
A nexus file is generated pointing to all aligned sequences.
A phylogenetic tree is generated using the nexus file as input.

BLAST searches are threaded, or, optionally, written to a file to be submitted to a compute cluster. mafft alignment commands can also be written to a file for submission to a compute cluster.

Input query files and genome directories are scanned for updates - if sequences are added, removed, or changed, the analysis is re-done.

Multiple queries targeting the same gene sequence can be used to improve coverage of disparate gene sequences, e.g. attempting to cover an entire phylum with multiple reference genomes being used.

Author Contact

Ed Davis

Acknowledgments

Special thanks for helping me test the software and get the python code packaged:

Also, thanks to these groups for supporting me through my scientific career:

License

automlsa2 is distributed under the terms listed in the LICENSE file. The software is free for non-commercial use.

Copyrights

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- Free for non-commercial use
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.9.0

Jan 31, 2023

0.8.1

Mar 4, 2022

0.8.0

Feb 11, 2022

0.7.1

Apr 7, 2021

0.7.0

Dec 12, 2020

0.6.1

Nov 21, 2020

0.6.0

Nov 21, 2020

0.5.2

Nov 4, 2020

0.5.1

Oct 29, 2020

0.5.0

Oct 29, 2020

0.4.0

Oct 29, 2020

0.3.1

Oct 22, 2020

0.3.0

Oct 21, 2020

This version

0.2.0

Oct 16, 2020

0.1.2

Oct 15, 2020

0.1.1

Oct 15, 2020

0.1.0

Oct 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automlsa2-0.2.0.tar.gz (22.9 kB view hashes)

Uploaded Oct 16, 2020 Source

Built Distribution

automlsa2-0.2.0-py3-none-any.whl (26.8 kB view hashes)

Uploaded Oct 16, 2020 Python 3

Hashes for automlsa2-0.2.0.tar.gz

Hashes for automlsa2-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`cb08900510f6d554eb002cdb22ccf18eea20c0f42c6d0407a3442a827238a977`
MD5	`62465e50676a495f5f5f9ec401998beb`
BLAKE2b-256	`28c1a40593d73a1e891338feb16d2a1edef9aa31c224d6bd1360a7ef06f6c148`

Hashes for automlsa2-0.2.0-py3-none-any.whl

Hashes for automlsa2-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2eb0450ef599a59e096470abb288b0eb8f9db3b046acce039db5f934d4c099de`
MD5	`68ead44b8b96327c3874a557ebd176be`
BLAKE2b-256	`fae1e7564706aaceb7324fc19434ec419ec60de31b4cd708a5503a893dbd4f4e`