A Python package for interacting with SRAdb and downloading datasets from SRA/ENA/GEO
Project description
# A Python package for retrieving metadata from SRA/ENA/GEO
[![image](https://img.shields.io/pypi/v/pysradb.svg?style=flat-square)](https://pypi.python.org/pypi/pysradb) [![image](https://anaconda.org/bioconda/pysradb/badges/version.svg)](https://anaconda.org/bioconda/pysradb/badges/version.svg) [![image](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/pysradb/README.html) [![image](https://static.pepy.tech/personalized-badge/pysradb?period=month&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads/month)](https://pepy.tech/project/pysradb) [![image](https://anaconda.org/bioconda/pysradb/badges/downloads.svg)](https://anaconda.org/bioconda/pysradb) [![image](https://zenodo.org/badge/159590788.svg)](https://zenodo.org/badge/latestdoi/159590788) [![image](https://github.com/saketkc/pysradb/workflows/push/badge.svg)](https://github.com/saketkc/pysradb/actions)
## Documentation
<https://saketkc.github.io/pysradb>
## CLI Usage
pysradb supports command line usage. See [CLI](https://saket-choudhary.me/pysradb/cmdline.html) instructions or [quickstart guide](https://www.saket-choudhary.me/pysradb/quickstart.html).
- $ pysradb
- usage: pysradb [-h] [–version] [–citation]
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs} …
pysradb: Query NGS metadata and data from NCBI Sequence Read Archive. version: 2.0.1 Citation: 10.12688/f1000research.18676.1
- optional arguments:
- -h, --help
show this help message and exit
- --version
show program’s version number and exit
- --citation
how to cite
- subcommands:
- {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs}
metadata Fetch metadata for SRA project (SRPnnnn) download Download SRA project (SRPnnnn) search Search SRA for matching text gse-to-gsm Get GSM for a GSE gse-to-srp Get SRP for a GSE gsm-to-gse Get GSE for a GSM gsm-to-srp Get SRP for a GSM gsm-to-srr Get SRR for a GSM gsm-to-srs Get SRS for a GSM gsm-to-srx Get SRX for a GSM srp-to-gse Get GSE for a SRP srp-to-srr Get SRR for a SRP srp-to-srs Get SRS for a SRP srp-to-srx Get SRX for a SRP srr-to-gsm Get GSM for a SRR srr-to-srp Get SRP for a SRR srr-to-srs Get SRS for a SRR srr-to-srx Get SRX for a SRR srs-to-gsm Get GSM for a SRS srs-to-srx Get SRX for a SRS srx-to-srp Get SRP for a SRX srx-to-srr Get SRR for a SRX srx-to-srs Get SRS for a SRX
## Quickstart
A Google Colaboratory version of most used commands are available in this [Colab Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR) . Note that this requires only an active internet connection (no additional downloads are made).
The following notebooks document all the possible features of `pysradb`:
[Python API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
[Downloading datasets from SRA - command line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
[Parallely download multiple datasets - Python API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
[Converting SRA-to-fastq - command line (requires conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
[Downloading subsets of a project - Python API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
[Download BAMs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Download_BAMs.ipynb)
[Metadata for multiple SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Multiple_SRPs.ipynb)
[Multithreaded fastq downloads using Aspera Client](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb)
[Searching SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/09.Query_Search.ipynb)
## Installation
To install stable version using `pip`:
` bash pip install pysradb `
Alternatively, if you use conda:
` bash conda install -c bioconda pysradb `
This step will install all the dependencies. If you have an existing environment with a lot of pre-installed packages, conda might be [slow](https://github.com/bioconda/bioconda-recipes/issues/13774). Please consider creating a new enviroment for pysradb:
` bash conda create -c bioconda -n pysradb PYTHON=3.10 pysradb `
### Dependencies
pandas requests tqdm xmltodict
### Installing pysradb in development mode
git clone https://github.com/saketkc/pysradb.git cd pysradb && pip install -r requirements.txt pip install -e .
## Using pysradb
### Obtaining SRA metadata
$ pysradb metadata SRP000941 | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases SRP000941 SRX056722 Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS184466 Illumina HiSeq 2000 26900401 531654480 SRR179707 26900401 807012030 SRP000941 SRX027889 Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS116481 Illumina Genome Analyzer II 37528590 779578968 SRR067978 37528590 1351029240 SRP000941 SRX027888 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116483 Illumina Genome Analyzer II 13603127 3232309537 SRR067977 13603127 489712572 SRP000941 SRX027887 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116562 Illumina Genome Analyzer II 22430523 506327844 SRR067976 22430523 807498828 SRP000941 SRX027886 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116560 Illumina Genome Analyzer II 15342951 301720436 SRR067975 15342951 552346236 SRP000941 SRX027885 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116482 Illumina Genome Analyzer II 39725232 851429082 SRR067974 39725232 1430108352 SRP000941 SRX027884 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116481 Illumina Genome Analyzer II 32633277 544478483 SRR067973 32633277 1174797972 SRP000941 SRX027883 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS004118 Illumina Genome Analyzer II 22150965 3262293717 SRR067972 9357767 336879612 SRP000941 SRX027883 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS004118 Illumina Genome Analyzer II 22150965 3262293717 SRR067971 12793198 460555128
### Obtaining detailed SRA metadata
$ pysradb metadata SRP075720 –detailed | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases SRP075720 SRX1800476 GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467643 Illumina HiSeq 2500 2547148 97658407 SRR3587912 2547148 127357400 SRP075720 SRX1800475 GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467642 Illumina HiSeq 2500 2676053 101904264 SRR3587911 2676053 133802650 SRP075720 SRX1800474 GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467641 Illumina HiSeq 2500 1603567 61729014 SRR3587910 1603567 80178350 SRP075720 SRX1800473 GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467640 Illumina HiSeq 2500 2498920 94977329 SRR3587909 2498920 124946000 SRP075720 SRX1800472 GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467639 Illumina HiSeq 2500 2226670 83473957 SRR3587908 2226670 111333500 SRP075720 SRX1800471 GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467638 Illumina HiSeq 2500 2269546 87486278 SRR3587907 2269546 113477300 SRP075720 SRX1800470 GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467636 Illumina HiSeq 2500 2333284 88669838 SRR3587906 2333284 116664200 SRP075720 SRX1800469 GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467637 Illumina HiSeq 2500 2071159 79689296 SRR3587905 2071159 103557950 SRP075720 SRX1800468 GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467635 Illumina HiSeq 2500 2321657 89307894 SRR3587904 2321657 116082850
### Converting SRP to GSE
$ pysradb srp-to-gse SRP075720
study_accession study_alias SRP075720 GSE81903
### Converting GSM to SRP
$ pysradb gsm-to-srp GSM2177186
experiment_alias study_accession GSM2177186 SRP075720
### Converting GSM to GSE
$ pysradb gsm-to-gse GSM2177186
experiment_alias study_alias GSM2177186 GSE81903
### Converting GSM to SRX
$ pysradb gsm-to-srx GSM2177186
experiment_alias experiment_accession GSM2177186 SRX1800089
### Converting GSM to SRR
$ pysradb gsm-to-srr GSM2177186
experiment_alias run_accession GSM2177186 SRR3587529
### Downloading supplementary files from GEO
$ pysradb download -g GSE161707
### Downloading an entire SRA/ENA project (multithreaded)
pysradb makes it super easy to download datasets from SRA parallely: Using 8 threads to download:
$ pysradb download -y -t 8 –out-dir ./pysradb_downloads -p SRP063852
Downloads are organized by SRP/SRX/SRR mimicking the hierarchy of SRA projects.
### Downloading only certain samples of interest
$ pysradb metadata SRP000941 –detailed | grep ‘study|RNA-Seq’ | pysradb download
This will download all RNA-seq samples coming from this project.
### Ultrafast fastq downloads
With [aspera-client](https://downloads.asperasoft.com/en/downloads/8?list) installed, [pysradb]{.title-ref} can perform ultra fast downloads:
To download all original fastqs with [aspera-client]{.title-ref} installed utilizing 8 threads:
$ pysradb download -t 8 –use_ascp -p SRP002605
Refer to the notebook for [(shallow) time benchmarks](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb).
## Publication
> [pysradb: A Python package to query next-generation sequencing > metadata and data from NCBI Sequence Read > Archive](https://f1000research.com/articles/8-532/v1) > > Presentation slides from BOSC (ISMB-ECCB) 2019: > <https://f1000research.com/slides/8-1183>
## Citation
Choudhary, Saket. "pysradb: A Python Package to Query next-Generation Sequencing Metadata and Data from NCBI Sequence Read Archive." F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532 (<https://f1000research.com/articles/8-532/v1>)
@article{Choudhary2019, doi = {10.12688/f1000research.18676.1}, url = {https://doi.org/10.12688/f1000research.18676.1}, year = {2019}, month = apr, publisher = {F1000 (Faculty of 1000 Ltd)}, volume = {8}, pages = {532}, author = {Saket Choudhary}, title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive}, journal = {F1000Research} }
Zenodo archive: <https://zenodo.org/badge/latestdoi/159590788>
Zenodo DOI: 10.5281/zenodo.2306881
## Questions?
Open an [issue](https://github.com/saketkc/pysradb/issues) or join our [Slack Channel](https://join.slack.com/t/pysradb/shared_invite/zt-f01jndpy-KflPu3Be5Aq3FzRh5wj1Ug).
# History
# 2.2.0 (2023-09-17)
Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
Documentation fixes
## 2.1.0 (2023-05-16)
Fix for [gse-to-srp]{.title-ref} returning unrequested GSEs ([#186 <https://github.com/saketkc/pysradb/issues/190>]{.title-ref})
Fix for [download]{.title-ref} using [public_urls]{.title-ref}
Fix for [gsm-to-srx]{.title-ref} returning false positives ([#165 <https://github.com/saketkc/pysradb/issues/165>]{.title-ref})
Fix for delimiter not being consistent when metadata is printed on terminal ([#147 <https://github.com/saketkc/pysradb/issues/147>]{.title-ref})
ENA search is currently broken because of an API change
## 2.0.2 (2023-04-09)
Fix for [gse-to-srp]{.title-ref} to handle cases where a project is missing but SRXs are returned ([#186 <https://github.com/saketkc/pysradb/issues/186>]{.title-ref})
Fix gse-to-gsm ([#187 <https://github.com/saketkc/pysradb/issues/187>]{.title-ref})
## 2.0.1 (2023-03-18)
Fix for [pysradb download]{.title-ref} - using [public_url]{.title-ref}
Fix for SRX -> SRR and related conversions ([#183 <https://github.com/saketkc/pysradb/pull/183>]{.title-ref})
## 2.0.0 (2023-02-23)
BREAKING change: Overhaul of how urls and associated metadata are returned (not backward compatible); all column names are lower cased by default
Fix extra space in "organism_taxid" column
Added support for Experiment attributes ([#89 <https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532>]{.title-ref})
## 1.4.2 (06-17-2022)
Fix ENA fastq fetching ([#163 <https://github.com/saketkc/pysradb/issues/163>]{.title-ref})
## 1.4.1 (06-04-2022)
Fix for fetchin alternative URLs
## 1.4.0 (06-04-2022)
Added ability to fetch alternative URLs (GCP/AWS) for metadata ([#161 <https://github.com/saketkc/pysradb/issues/161>]{.title-ref})
Fix for xmldict 0.13.0 no longer defaulting to OrderedDict ([#159 <https://github.com/saketkc/pysradb/pull/159>]{.title-ref})
Fix for missing experiment model and description in metadata ([#160 <https://github.com/saketkc/pysradb/issues/160>]{.title-ref})
## 1.3.0 (02-18-2022)
Add [study_title]{.title-ref} to [--detailed]{.title-ref} flag ([#152](https://github.com/saketkc/pysradb/issues/152))
Fix [KeyError]{.title-ref} in [metadata]{.title-ref} where some new IDs do not have any metadata ([#151](https://github.com/saketkc/pysradb/issues/151))
## 1.2.0 (01-10-2022)
Do not exit if a qeury returns no hits ([#149 <https://github.com/saketkc/pysradb/pull/149>]{.title-ref})
## 1.1.0 (12-12-2021)
Fixed [gsm-to-gse]{.title-ref} failure ([#128](https://github.com/saketkc/pysradb/pull/128))
Fixed case sensitivity bug for ENA search ([#144](https://github.com/saketkc/pysradb/pull/144))
Fixed publication date bug for search ([#146](https://github.com/saketkc/pysradb/pull/146))
Added support for downloading data from GEO [pysradb dowload -g <GSE>]{.title-ref} ([#129](https://github.com/saketkc/pysradb/pull/129))
## 1.0.1 (01-10-2021)
Dropped Python 3.6 since pandas 1.2 is not supported
## 1.0.0 (01-09-2021)
Retired metadb and SRAdb based search through CLI - everything defaults to SRAweb
SRAweb now supports [search](https://saket-choudhary.me/pysradb/quickstart.html#search)
[N/A]{.title-ref} is now replaced with [pd.NA]{.title-ref}
Two new fields in `--detailed`: [instrument_model]{.title-ref} and [instrument_model_desc]{.title-ref} [#75](https://github.com/saketkc/pysradb/issues/75)
Updated documentation
## 0.11.1 (09-18-2020)
[library_layout]{.title-ref} is now outputted in metadata #56
[-detailed]{.title-ref} unifies columns for ENA fastq links instead of appending _x/_y #59
bugfix for parsing namespace in xml outputs #65
XML errors from NCBI are now handled more gracefully #69
Documentation and dependency updates
## 0.11.0 (09-04-2020)
[pysradb download]{.title-ref} now supports multiple threads for paralle downloads
[pysradb download]{.title-ref} also supports ultra fast downloads of FASTQs from ENA using aspera-client
## 0.10.3 (03-26-2020)
Added test cases for SRAweb
API limit exceeding errors are automagically handled
Bug fixes for GSE <=> SRR
Bug fix for metadata - supports multiple SRPs
Contributors
Dibya Gautam
Marius van den Beek
## 0.10.2 (02-05-2020)
Bug fix: Handle API-rate limit exceeding => Retries
Enhancement: 'Alternatives' URLs are now part of [--detailed]{.title-ref}
## 0.10.1 (02-04-2020)
Bug fix: Handle Python3.6 for capture_output in subprocess.run
## 0.10.0 (01-31-2020)
All the subcommands (srx-to-srr, srx-to-srs) will now print additional columns where the first two columns represent the relevant conversion
Fixed a bug where for fetching entries with single efetch record
## 0.9.9 (01-15-2020)
Major fix: some SRRs would go missing as the experiment dict was being created only once per SRR (See #15)
Features: More detailed metadata by default in the SRAweb mode
See notebook: <https://colab.research.google.com/drive/1C60V->
## 0.9.7 (01-20-2020)
Feature: instrument, run size and total spots are now printed in the metadata by default (SRAweb mode only)
Issue: Fixed an issue with srapath failing on SRP. srapath is now run on individual SRRs.
## 0.9.6 (07-20-2019)
Introduced [SRAweb]{.title-ref} to perform queries over the web if the SQLite is missing or does not contain the relevant record.
## 0.9.0 (02-27-2019)
### Others
This release completely changes the command line interface replacing click with argparse (<https://github.com/saketkc/pysradb/pull/3>)
Removed Python 2 comptaible stale code
## 0.8.0 (02-26-2019)
### New methods/functionality
`srr-to-gsm`: convert SRR to GSM
SRAmetadb.sqlite.gz file is deleted by default after extraction
When SRAmetadb is not found a confirmation is seeked before downloading
Confirmation option before SRA downloads
### Bugfix
download() works with wget
### Others
[--out_dir]{.title-ref} is now [out-dir]{.title-ref}
## 0.7.1 (02-18-2019)
Important: Python2 is no longer supported. Please consider moving to Python3.
### Bugfix
Included docs in the index whihch were missed out in the previous release
## 0.7.0 (02-08-2019)
### New methods/functionality
`gsm-to-srr`: convert GSM to SRR
`gsm-to-srx`: convert GSM to SRX
`gsm-to-gse`: convert GSM to GSE
### Renamed methods
The following commad line options have been renamed and the changes are not compatible with 0.6.0 release:
[sra-metadata]{.title-ref} -> [metadata]{.title-ref}.
[sra-search]{.title-ref} -> [search]{.title-ref}.
[srametadb]{.title-ref} -> [metadb]{.title-ref}.
## 0.6.0 (12-25-2018)
### Bugfix
Fixed bugs introduced in 0.5.0 with API changes where multiple redundant columns were output in [sra-metadata]{.title-ref}
### New methods/functionality
[download]{.title-ref} now allows piped inputs
## 0.5.0 (12-24-2018)
### New methods/functionality
Support for filtering by SRX Id for SRA downloads.
`srr_to_srx`: Convert SRR to SRX/SRP
`srp_to_srx`: Convert SRP to SRX
Stripped down [sra-metadata]{.title-ref} to give minimal information
Added [--assay]{.title-ref}, [--desc]{.title-ref}, [--detailed]{.title-ref} flag for [sra-metadata]{.title-ref}
Improved table printing on terminal
## 0.4.2 (12-16-2018)
### Bugfix
Fixed unicode error in tests for Python2
## 0.4.0 (12-12-2018)
### New methods/functionality
Added a new [BASEdb]{.title-ref} class to handle common database connections
Initial support for GEOmetadb through GEOdb class
Initial support or a command line interface: - download Download SRA project (SRPnnnn) - gse-metadata Fetch metadata for GEO ID (GSEnnnn) - gse-to-gsm Get GSM(s) for GSE - gsm-metadata Fetch metadata for GSM ID (GSMnnnn) - sra-metadata Fetch metadata for SRA project (SRPnnnn)
Added three separate notebooks for SRAdb, GEOdb, CLI usage
## 0.3.0 (12-05-2018)
### New methods/functionality
[sample_attribute]{.title-ref} and [experiment_attribute]{.title-ref} are now included by default in the df returned by [sra_metadata()]{.title-ref}
[expand_sample_attribute_columns: expand metadata dataframe based on attributes in `sample_attribute]{.title-ref} column
New methods to guess cell/tissue/strain: [guess_cell_type()]{.title-ref}/[guess_tissue_type()]{.title-ref}/[guess_strain_type()]{.title-ref}
Improved README and usage instructions
## 0.2.2 (12-03-2018)
### New methods/functionality
[search_sra()]{.title-ref} allows full text search on SRA metadata.
## 0.2.0 (12-03-2018)
### Renamed methods
The following methods have been renamed and the changes are not compatible with 0.1.0 release:
[get_query()]{.title-ref} -> [query()]{.title-ref}.
[sra_convert()]{.title-ref} -> [sra_metadata()]{.title-ref}.
[get_table_counts()]{.title-ref} -> [all_row_counts()]{.title-ref}.
### New methods/functionality
[download_sradb_file()]{.title-ref} makes fetching [SRAmetadb.sqlite]{.title-ref} file easy; wget is no longer required.
[ftp]{.title-ref} protocol is now supported besides [fsp]{.title-ref} and hence [aspera-client]{.title-ref} is now optional. We however, strongly recommend [aspera-client]{.title-ref} for faster downloads.
### Bug fixes
Silenced [SettingWithCopyWarning]{.title-ref} by excplicitly doing operations on a copy of the dataframe instead of the original.
Besides these, all methods now follow a [numpydoc]{.title-ref} compatible documentation.
## 0.1.0 (12-01-2018)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.