Skip to main content

Remove outlier sequences from multiple sequence alignment

Project description

seqSieve will try to remove sequences that cause misalignments from a multiple sequence alignment(MSA). It reads a given MSA in multi-fasta format and removes sequences with the highest penalty scores, then builds the next MSA without those sequences. This process is repeated until a user-specified cutt-off is reached or less than three sequences are left to be aligned.

Usage:

######################################
# seqSieve.py
######################################
usage:
   seqSieve.py -f multifasta alignment
options:
    -f, --fasta=FILE    multifasta alignment (eg. "align.fas")
    OR
    -F, --fasta_dir=DIR directory with multifasta files (needs -s SUFFIX)
    -s, --suffix=SUFFIX will try to work with files that end with SUFFIX (eg ".fas")

    -a, --msa_tool=STR  supported: "mafft", prank, prankf (= prank +F) [default:"mafft"]
    -i, --max_iterations=NUM    force stop after NUM iterations
    -n, --num_threads=NUM   max number of threads to be executed in parallel [default: 1]
    -m, --mode=MODE         set strategy to remove outlier sequences [default: "Sites"]
                            available modes (not case sensitive):
                                "Sites", "Gaps", "uGaps","Insertions",
                                "uInsertions","uInsertionsGaps", "custom"
    -q, --no-realign        don't realign with each iteration (not recommended)
    -l, --log       write logfile
    -h, --help      prints this

only for mode "custom":
    -g, --gap_penalty=NUM        set gap penalty [default: 1.0]
    -G, --unique_gap_penalty=NUM set unique gap penalty [default: 10.0]
    -j, --insertion_penalty=NUM  set insertion penalty [default:1.0]
    -J, --unique_insertion_penalty=NUM set insertion penalty [default:1.0]
    -M, --mismatch_penalty=NUM   set mismatch penalty [default:1.0]
    -r, --match_reward=NUM       set match reward [default: -10.0]

Currently supported multiple sequence aligners:

Requirements

  • matplotlib

  • numpy

External Programs

  • mafft and/or

  • prank

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqSieve-0.9.1.tar.gz (9.4 kB view hashes)

Uploaded Source

Built Distributions

seqSieve-0.9.1-py3.2.egg (29.3 kB view hashes)

Uploaded Source

seqSieve-0.9.1-py2.7.egg (28.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page