A parsing tool for AMP tools.
Project description
AMPcombi : AntiMicrobial Peptides parsing and functional classification tool
This tool parses the results of antimicrobial peptide (AMP) prediction tools into a single table and aligns the hits against a reference AMP database for functional classifications.
For parsing: AMPcombi is developed to parse the output of these AMP prediction tools:
Tool | Version | Link |
---|---|---|
Ampir | 1.1.0 | https://github.com/Legana/ampir |
AMPlify | 1.0.3 | https://github.com/bcgsc/AMPlify |
Macrel | 1.1.0 | https://github.com/BigDataBiology/macrel |
HMMsearch | 3.3.2 | https://github.com/EddyRivasLab/hmmer |
EnsembleAMPpred | - | https://pubmed.ncbi.nlm.nih.gov/33494403/ |
NeuBI | - | https://github.com/nafizh/NeuBI |
For classification: AMPcombi is developed to offer functional annotation of the detected AMPs by alignment to an AMP reference databases, for e.g.,:
Tool | Version | Link |
---|---|---|
DRAMP | 3.0 | https://github.com/CPU-DRAMP/DRAMP-3.0 |
Alignment to the reference database is done using diamond blastp v.2.0.15
======================
Installation
======================
To install AMPcombi:
Add dependencies of the tool; python
> 3.0, biopython
, pandas
and diamond
.
Installation can be done using:
- pip installation
pip install AMPcombi
- git repository
git clone https://github.com/Darcy220606/AMPcombi.git
- conda
conda env create -f ampcombi/environment.yml
or
conda install -c bioconda AMPcombi
======================
Usage:
======================
There are two basic commands to run AMPcombi:
- Using
--amp_results
ampcombi \
--amp_results path/to/my/result_folder/ \
--faa path/to/sample_faa_files/
Here the head folder containing output files has to be given. AMPcombi finds and summarizes the output files from different tools, if the folder is structured and named as: /result_folder/toolsubdir/samplesubdir/sample.tool.filetype
.
- Note that the filetype ending might vary and can be specified with
--tooldict
, if it is different from the default. When passing a dictionary via command line, this has to be done as a string with single quotes' '
and the dictionary keys and items with double quotes" "
. i.e.'{"key1":"item1", "key2":"item2"}'
- Note that
--sample_list
can also be given if only specfic samples are needed from the driectory.
The path to the folder containing the respective protein fasta files has to be provided with --faa
. The files have to be named with <samplename>.faa
.
Structure of the results folder:
amp_results/
├── tool_1/
| ├── sample_1/
| | └── sample_1.tool_1.tsv
| └── sample_2/
| | └── sample_2.tool_1.tsv
├── tool_2/
| ├── sample_1/
| | └── sample_1.tool_2.txt
| └── sample_2/
| | └── sample_2.tool_2.txt
├── tool_3/
├── sample_1/
| └── sample_1.tool_3.predict
└── sample_2/
└── sample_2.tool_3.predict
- Using
--path_list
and--sample_list
ampcombi \
--path_list path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv \
--path_list path_to_sample_2_tool_1.csv path_to_sample_2_tool_1.csv \
--sample_list sample_1 sample_2 \
--faa path/to/sample_faa_files/
Here the paths to the output-files to be summarized can be given by --path_list
for each sample. Together with this option a list of sample-names has to be supplied.
Either the path to the folder containing the respective protein fasta files has to be provided with --faa
or, in case of only one sample, the path to the corresponding .faa
file. The files have to be named with <samplename>.faa
.
Input options:
command | definition | default | example |
---|---|---|---|
--amp_results | path to the folder containing different tool's output files | ./test_files/ | ../amp_results/ |
--sample_list | list of samples' names | - | sample_1 sample_2 |
--path_list | list of paths to output files | - | path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv |
--cutoff | probability cutoff to filter AMPs | 0 | 0.5 |
--faa | path to the folder containing the samples.faa files or, in case of only one sample, the path to the corresponding .faa file. Filenames have to contain the corresponding sample-name, i.e. sample_1.faa |
./test_faa/ | ./faa_files/ |
--tooldict | dictionary of AMP-tools and their respective output file endings | '{"ampir":"ampir.tsv", "amplify":"amplify.tsv", "macrel":"macrel.tsv", "hmmer_hmmsearch":"hmmsearch.txt", "ensembleamppred":"ensembleamppred.txt"}' | - |
--amp_database | path to the folder containing the reference database files: (1) a fasta file with <.fasta> file extension and (2) the corresponding table with with functional and taxonomic classifications in <.tsv> file extension | DRAMP 'general amps' database | ./amp_ref_database/ |
--complete_summary | concatenates all samples' summarized tables into one and generates both 'csv' and interactive 'html' files | False | True |
--log | print messages into log file instead of stdout | False | True |
--threads | adjust the number of threads required for DIAMOND alignemnt depending on the computing resources available | 4 | 32 |
--version | print the version number into stdout | - | 0.1.4 |
- Note: The fasta file corresponding to the AMP database should not contain any characters other than ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
- Note: The reference database table should be tab delimited.
Output:
The output will be written into your working directory, containing the following files and folders:
<pwd>/
├── amp_ref_database/
| ├── amp_ref.dmnd
| ├── general_amps_<DATE>_clean.fasta
| └── general_amps_<DATE>.tsv
├── sample_1/
| ├── sample_1_amp.faa
| ├── sample_1_ampcombi.csv
| └── sample_1_diamond_matches.txt
├── sample_2/
| ├── sample_2_amp.faa
| ├── sample_2_ampcombi.csv
| └── sample_2_diamond_matches.txt
├── AMPcombi_summary.csv
├── AMPcombi_summary.html
└── ampcombi.log
======================
Contribution:
======================
AMPcombi is a tool developed for parsing results from published AMP prediction tools. We therefore welcome fellow contributors who would like to add new AMP prediction tools results for parsing and alignment.
Adding a new tool to AMPcombi
In ampcombi/reformat_tables.py
- add a new tool function to read the output to a pandas dataframe and return two columns named
contig_id
andprob_<toolname>
- add the new function to the
read_path
function
In ampcombi/main.py
- add your default
tool:tool.fileending
to the default of--tooldict
======================
Authors: @louperelo and @darcy220606
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.