A parsing tool for AMP tools.

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

AMPcombi : AntiMicrobial Peptides parsing and functional classification tool

This tool parses the results of antimicrobial peptide (AMP) prediction tools into a single table and aligns the hits against a reference AMP database for functional classifications.

For parsing: AMPcombi is developed to parse the output of these AMP prediction tools:

Tool	Version	Link
Ampir	1.1.0	https://github.com/Legana/ampir
AMPlify	1.0.3	https://github.com/bcgsc/AMPlify
Macrel	1.1.0	https://github.com/BigDataBiology/macrel
HMMsearch	3.3.2	https://github.com/EddyRivasLab/hmmer
EnsembleAMPpred	-	https://pubmed.ncbi.nlm.nih.gov/33494403/
NeuBI	-	https://github.com/nafizh/NeuBI

For classification: AMPcombi is developed to offer functional annotation of the detected AMPs by alignment to an AMP reference databases, for e.g.,:

Tool	Version	Link
DRAMP	3.0	https://github.com/CPU-DRAMP/DRAMP-3.0

Alignment to the reference database is done using diamond blastp v.2.0.15

======================

Installation

======================

To install AMPcombi:

Add dependencies of the tool; python > 3.0, biopython, pandas and diamond. Installation can be done using:

pip installation

pip install AMPcombi

git repository

git clone https://github.com/Darcy220606/AMPcombi.git

conda

conda env create -f ampcombi/environment.yml

 conda install -c bioconda AMPcombi

======================

Usage:

======================

There are two basic commands to run AMPcombi:

Using --amp_results

ampcombi \
--amp_results path/to/my/result_folder/ \
--faa path/to/sample_faa_files/

Here the head folder containing output files has to be given. AMPcombi finds and summarizes the output files from different tools, if the folder is structured and named as: /result_folder/toolsubdir/samplesubdir/sample.tool.filetype.

Note that the filetype ending might vary and can be specified with --tooldict, if it is different from the default. When passing a dictionary via command line, this has to be done as a string with single quotes ' ' and the dictionary keys and items with double quotes " ". i.e. '{"key1":"item1", "key2":"item2"}'
Note that --sample_list can also be given if only specfic samples are needed from the driectory.

The path to the folder containing the respective protein fasta files has to be provided with --faa. The files have to be named with <samplename>.faa.

Structure of the results folder:

amp_results/
├── tool_1/
|   ├── sample_1/
|   |   └── sample_1.tool_1.tsv
|   └── sample_2/
|   |   └── sample_2.tool_1.tsv
├── tool_2/
|   ├── sample_1/
|   |   └── sample_1.tool_2.txt
|   └── sample_2/
|   |   └── sample_2.tool_2.txt
├── tool_3/
    ├── sample_1/
    |   └── sample_1.tool_3.predict
    └── sample_2/
        └── sample_2.tool_3.predict

Using --path_list and --sample_list

ampcombi \
--path_list path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv \
--path_list path_to_sample_2_tool_1.csv path_to_sample_2_tool_1.csv \
--sample_list sample_1 sample_2 \
--faa path/to/sample_faa_files/

Here the paths to the output-files to be summarized can be given by --path_list for each sample. Together with this option a list of sample-names has to be supplied. Either the path to the folder containing the respective protein fasta files has to be provided with --faa or, in case of only one sample, the path to the corresponding .faa file. The files have to be named with <samplename>.faa.

Input options:

command	definition	default	example
--amp_results	path to the folder containing different tool's output files	./test_files/	../amp_results/
--sample_list	list of samples' names	-	sample_1 sample_2
--path_list	list of paths to output files	-	path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv
--cutoff	probability cutoff to filter AMPs	0	0.5
--faa	path to the folder containing the samples`.faa` files or, in case of only one sample, the path to the corresponding `.faa` file. Filenames have to contain the corresponding sample-name, i.e. sample_1.faa	./test_faa/	./faa_files/
--tooldict	dictionary of AMP-tools and their respective output file endings	'{"ampir":"ampir.tsv", "amplify":"amplify.tsv", "macrel":"macrel.tsv", "hmmer_hmmsearch":"hmmsearch.txt", "ensembleamppred":"ensembleamppred.txt"}'	-
--amp_database	path to the folder containing the reference database files: (1) a fasta file with <.fasta> file extension and (2) the corresponding table with with functional and taxonomic classifications in <.tsv> file extension	DRAMP 'general amps' database	./amp_ref_database/
--complete_summary	concatenates all samples' summarized tables into one and generates both 'csv' and interactive 'html' files	False	True
--log	print messages into log file instead of stdout	False	True
--threads	adjust the number of threads required for DIAMOND alignemnt depending on the computing resources available	4	32
--version	print the version number into stdout	-	0.1.4

Note: The fasta file corresponding to the AMP database should not contain any characters other than ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
Note: The reference database table should be tab delimited.

Output:

The output will be written into your working directory, containing the following files and folders:

<pwd>/
├── amp_ref_database/
|   ├── amp_ref.dmnd
|   ├── general_amps_<DATE>_clean.fasta
|   └── general_amps_<DATE>.tsv
├── sample_1/
|   ├── sample_1_amp.faa
|   ├── sample_1_ampcombi.csv
|   └── sample_1_diamond_matches.txt
├── sample_2/
|   ├── sample_2_amp.faa
|   ├── sample_2_ampcombi.csv
|   └── sample_2_diamond_matches.txt
├── AMPcombi_summary.csv
├── AMPcombi_summary.html
└── ampcombi.log

======================

Contribution:

======================

AMPcombi is a tool developed for parsing results from published AMP prediction tools. We therefore welcome fellow contributors who would like to add new AMP prediction tools results for parsing and alignment.

Adding a new tool to AMPcombi

In ampcombi/reformat_tables.py

add a new tool function to read the output to a pandas dataframe and return two columns named contig_id and prob_<toolname>
add the new function to the read_path function

In ampcombi/main.py

add your default tool:tool.fileending to the default of --tooldict

======================

Authors: @louperelo and @darcy220606

Project details

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.2

Mar 21, 2024

0.2.1

Mar 13, 2024

0.2.0

Feb 20, 2024

This version

0.1.7

Nov 3, 2022

0.1.6

Nov 2, 2022

0.1.5

Oct 27, 2022

0.1.4

Oct 18, 2022

0.1.3

Oct 7, 2022

0.1.2

Oct 7, 2022

0.1.1

Oct 7, 2022

0.1.0

Oct 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AMPcombi-0.1.7.tar.gz (15.0 kB view hashes)

Uploaded Nov 3, 2022 Source

Hashes for AMPcombi-0.1.7.tar.gz

Hashes for AMPcombi-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`a73ab40cc80670f403aa629d1c1d7318ce21cc952618fdc1eba0ef667fae3578`
MD5	`88e4a58431fce0158523e6b6bcdab3cd`
BLAKE2b-256	`0fca50ddcd2c41ee80a1d5a685c27ab6deb6122be9871740eb794c18c0f3f4a2`