Skip to main content

A set of helper functions for working with biological metadata from the SRA.

Project description

# Biometalib

Biometalib is a set of useful libraries and tools for working with SRA biological metadata.

## Installation

This library is designed for python 3+ and can be installed with pip or conda.

### Pip

Biometalib can be installed using pip.

`bash pip install -y biometalib `

Or the latest version can be installed by pip.

`bash pip install git+https://github.com/jfear/sramongo pip install git+https://github.com/jfear/biometalib `

### Conda [Suggested]

First make sure you have a working installation of Anaconda, I suggest [Miniconda](https://conda.io/miniconda.html).

`bash conda install -c jfear biometalib `

## Attribute Selector

Attribute selector is a helper script for selecting which attributes you want to focus on for a project. The biological metadata submitted by users contain a variety of different types of attributes. Sometimes these include things like misspellings or different forms of a word, it also includes attributes that are unique to a single project. This tool is to be used to quickly curate these columns. Attribute selector uses a YAML formatted file to store attribute decisions.

In the YAML file, selected attributes will be the keys. When merging multiple attributes into a single selected attribute they will be stored as values. For example:

` sex: - sex - Sex - gender `

Here the selected attribute sex has the attributes Sex and gender associated with it. There is also a special selected attribute ignore that will store a list of attributes that you want to ignore.

Using the BioSample selection sheet I have created a starting YAML that can be used when running attribute_selector.

To run the attributes selector on my public version of the Biometa database type:

`bash # Download example YAML $ wget -O my_attribute_selection.yaml https://raw.githubusercontent.com/jfear/biometalib/master/data/flybase_example.yaml $ attribute_selector --host mongo.geneticsunderground.com --port 27022 --db sra --username sra --password oliver --authenticationDatabase user-data --config my_attribute_selection.yaml `

attribute_selector is an interactive command line tool. Iterates overall attribute column names that are not already selected attributes in the YAML. The current attribute is displayed in red. At the prompt you can type:

  • k to set the current attribute as a selected attribute [keep]

  • r to rename the current attribute, this will set the current attribute as value of the renamed selected attribute [rename]

  • i adds current attribute to ignore list [ignore]

  • e show example values listed under the current attribute [example]

  • s show attributes with similar names (fuzzy string match). Here selected attributes will appear in yellow [similar]

  • n skip and go to the next attribute [next]

  • quit exit out of the program, but save progress.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biometalib-0.0.4.tar.gz (13.3 kB view hashes)

Uploaded Source

Built Distribution

biometalib-0.0.4-py3-none-any.whl (17.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page