jcvi · PyPI

Python utility libraries on genome assembly, annotation and comparative genomics

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python
- Python :: 2
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

Author:: Haibao Tang (tanghaibao), Vivek Krishnakumar (vivekkrish), Jingping Li (Jingping), Maria Kim (msarmien), Xingtan Zhang (tangerzhang)
Email:: tanghaibao@gmail.com
License:: BSD

Following modules are available as generic Bioinformatics handling methods.

algorithms
- Linear programming solver with SCIP and GLPK.
- Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.
- Longest or heaviest increasing subsequence.
- Matrix operations.
apps
- GenBank entrez accession and Phytozome downloader.
- Calculate (non)synonymous substitution rate between gene pairs.
- Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and visualization.
- Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.
formats

Currently supports .ace format (phrap, cap3, etc.), .agp (goldenpath), .bed format, .blast output, .btab format, .cas (CLC assembler output), .coords format (nucmer output), .fasta format, .fastq format, .fpc format, .gff format, obo format (ontology), .psl format (UCSC blat, GMAP, etc.), .posmap format (Celera assembler output), .sam format (read mapping), .contig format (TIGR assembly format), etc.
graphics
- BLAST or synteny dot plot.
- Histogram using R and ASCII art.
- Paint regions on set of chromosomes.
- Macro-synteny and micro-synteny plots.
utils
- Grouper can be used as disjoint set data structure.
- range contains common range operations, like overlap and chaining.
- Sybase connector to JCVI internal database.
- Miscellaneous cookbook recipes, iterators decorators, table utilities.

Then there are modules that contain domain-specific methods.

assembly
- K-mer histogram analysis.
- Preparation and validation of tiling path for clone-based assemblies.
- Scaffolding through BAMBUS, optical map and genetic map.
- Pre-assembly and post-assembly QC procedures.
annotation
- Training of ab initio gene predictors.
- Calculate gene, exon and intron statistics.
- Wrapper for PASA and EVM.
- Launch multiple MAKER processes.
compara
- C-score based BLAST filter.
- Synteny scan (de-novo) and lift over (find nearby anchors).
- Ancestral genome reconstruction using Sankoff’s and PAR method.
- Ortholog and tandem gene duplicates finder.

Applications

Please visit wiki for full-fledged applications.

Dependencies

Following are a list of third-party python packages that are used by some routines in the library. These dependencies are not mandatory since they are only used by a few modules.

There are other Python modules here and there in various scripts. The best way is to install them via pip install when you see ImportError.

Installation

The easiest way is to install it via PyPI:

easy_install jcvi

To install the development version:

pip install git+git://github.com/tanghaibao/jcvi.git

Alternatively, if you want to install manually:

cd ~/code  # or any directory of your choice
git clone git://github.com/tanghaibao/jcvi.git
export PYTHONPATH=~/code:$PYTHONPATH

Please replace ~/code above with whatever you like, but it must contain jcvi. To avoid setting PYTHONPATH everytime, please insert the export command in your .bashrc or .bash_profile.

In addition, a few module might ask for locations of external programs, if the extended cannot be found in your PATH. The external programs that are often used are:

Most of the scripts in this package contains multiple actions. To use the fasta example:

Usage:
    python -m jcvi.formats.fasta ACTION


Available ACTIONs:
          clean | Remove irregular chars in fasta seqs
           diff | Check if two fasta records contain same information
        extract | Given fasta file and seq id, retrieve the sequence in fasta format
          fastq | Combine fasta and qual to create fastq file
         filter | Filter the records by size
         format | Trim accession id to the first space or switch id based on 2-column mapping file
        fromtab | Convert 2-column sequence file to fasta format
           gaps | Print out a list of gap sizes within sequences
      identical | Given 2 fasta files, find all exactly identical records
            ids | Generate a list of headers
           info | Run `sequence_info` on fasta files
          ispcr | Reformat paired primers into ispcr query format
           join | Concatenate a list of seqs and add gaps in between
     longestorf | Find longest orf for cds fasta
           pair | Sort paired reads to .pairs, rest to .fragments
    pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs
           pool | Pool a bunch of fastafiles together and add prefix
         random | Randomly take some records
         sequin | Generate a gapped fasta file for sequin submission
           some | Include or exclude a list of records (also performs on .qual file if available)
           sort | Sort the records by ids, sizes, etc.
        summary | Report the real no of bases and n's in fasta files
           tidy | Normalize gap sizes and remove small components in fasta
      translate | Translate cds to proteins
           trim | Given a cross_match screened fasta, trim the sequence
           uniq | Remove records that are the same

Then you need to use one action, you can just do:

python -m jcvi.formats.fasta extract

This will tell you the options and arguments it expects.

Feel free to check out other scripts in the package, it is not just for FASTA.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python
- Python :: 2
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.4.4

Apr 13, 2024

1.4.2

Mar 1, 2024

1.3.9

Dec 30, 2023

1.3.8

Aug 30, 2023

1.3.7

Aug 29, 2023

1.3.6

Jun 23, 2023

1.3.5

May 20, 2023

1.3.4

Apr 27, 2023

1.3.3

Feb 17, 2023

1.3.2

Feb 8, 2023

1.3.1

Feb 6, 2023

1.2.20

Nov 26, 2022

1.2.19

Nov 26, 2022

1.2.18

Nov 26, 2022

1.2.17

Nov 25, 2022

1.2.16

Nov 24, 2022

1.2.15

Nov 22, 2022

1.2.14

Oct 5, 2022

1.2.13

Oct 5, 2022

1.2.12

Sep 27, 2022

1.2.11

Sep 8, 2022

1.2.10

Jul 8, 2022

1.2.9

Jun 29, 2022

1.2.8

Jun 28, 2022

1.2.7

Mar 21, 2022

1.2.6

Mar 20, 2022

1.2.5

Mar 18, 2022

1.2.4

Mar 14, 2022

1.2.3

Mar 8, 2022

1.2.1

Jan 21, 2022

1.1.23

Dec 25, 2021

1.1.22

Dec 14, 2021

1.1.21

Nov 25, 2021

1.1.20

Nov 22, 2021

1.1.19

Nov 5, 2021

1.1.18

Sep 9, 2021

1.1.17

Jul 22, 2021

1.1.16

Jul 13, 2021

1.1.15

Jun 27, 2021

1.1.14

Jun 19, 2021

1.1.13

Jun 19, 2021

1.1.12

Apr 18, 2021

1.1.11

Mar 26, 2021

1.1.10

Mar 21, 2021

1.1.9

Mar 21, 2021

1.1.8

Feb 18, 2021

1.1.7

Jan 25, 2021

1.1.6

Jan 15, 2021

1.1.5

Jan 11, 2021

1.1.4

Jan 9, 2021

1.1.3

Jan 9, 2021

1.1.2

Jan 9, 2021

1.1.1

Jan 9, 2021

1.0.14

Dec 13, 2020

1.0.13

Nov 29, 2020

1.0.12

Nov 25, 2020

1.0.11

Nov 5, 2020

1.0.10

Oct 28, 2020

1.0.9

Jul 25, 2020

1.0.8

Jun 25, 2020

1.0.7

Jun 15, 2020

1.0.6

Apr 15, 2020

1.0.5

Mar 3, 2020

1.0.4

Mar 3, 2020

1.0.3

Feb 11, 2020

1.0.2

Feb 10, 2020

1.0.1

Jan 31, 2020

0.9.14

Dec 10, 2019

0.9.13

Nov 3, 2019

0.9.12

Oct 3, 2019

0.9.11

Sep 28, 2019

0.9.10

Sep 25, 2019

0.9.9

Sep 19, 2019

0.9.6

Sep 14, 2019

0.8.12

Dec 10, 2018

0.8.4

Apr 28, 2018

0.7.7

Oct 15, 2017

0.7.6

Oct 7, 2017

0.7.5

Aug 31, 2017

0.7.4

Aug 23, 2017

0.7.3

Mar 6, 2017

0.7.1

Jan 20, 2017

0.6.9

Oct 2, 2016

0.6.6

Jun 26, 2016

0.6.2

Mar 6, 2016

0.6.1

Jan 26, 2016

0.5.9

Nov 14, 2015

0.5.8

Oct 14, 2015

0.5.7

Jul 29, 2015

0.5.6

Jul 7, 2015

0.5.5

Mar 31, 2015

0.5.4

Mar 31, 2015

0.5.3

Feb 16, 2015

0.5.2

Feb 9, 2015

0.5.1

Jan 3, 2015

0.4.12

Dec 22, 2014

0.4.10

Oct 11, 2014

0.4.9

Sep 18, 2014

0.4.8

Aug 26, 2014

This version

0.4.7

Jul 18, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jcvi-0.4.7.tar.gz (559.3 kB view hashes)

Uploaded Jul 18, 2014 Source

Hashes for jcvi-0.4.7.tar.gz

Hashes for jcvi-0.4.7.tar.gz
Algorithm	Hash digest
SHA256	`9181ce8cf9768c1829f72c542aa801ba7d21f218e50c0776da5e1d110479d96d`
MD5	`792e14ea0915bbbf2c2b9e207eccde27`
BLAKE2b-256	`9a4fc53e7eb012d5a42fca39fa8669a7d58f14673abbbe328b8c7e775cbf92cb`