Python API for interacting with the Sample API system (MS version)

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language
- Python
Topic
- Scientific/Engineering
- Scientific/Engineering :: Bio-Informatics

Project description

Sample Metadata

The sample-metadata system is database that stores de-identified metadata.

There are three components to the sample-metadata system:

System-versioned MariaDB database,
Python web API to manage permissions, and store frequently used queries,
An installable python library that wraps the Python web API (using OpenAPI generator)

Every resource in sample-metadata belongs to a project. All resources are access controlled through membership of the google groups: $dataset-sample-metadata-main-{read,write}. Note that members of google-groups are cached in a secret as group-membership identity checks are slow.

Structure

Database structure

Sample IDs

In an effort to reduce our dependency on potentially mutable external sample IDs with inconsistent format, the sample-metadata server generates an internal sample id for every sample. Internally they're an incrementing integer, but they're transformed externally to have a prefix, and checksum - this allows durability when transcribing sample IDs to reduce mistypes, and allow to quickly check whether a sample ID is valid.

NB: The prefix and checksums are modified per environment (production, development, local) to avoid duplicates from these environments.

For example, let's consider the production environment which uses the prefix of CPG and a checksum offset of 0:

A sample is given the internal ID 12345, we calculate the Luhn checksum to be 5 (with no offset applied). We can then concatenate the results, for the final sample ID to be CPG123455.

Reporting sex

To avoid ambiguity in reporting of gender, sex and karyotype - the sample metadata system stores these values separately on the participant as:

reported_gender (string, expected male | female | other values)
reported_sex (follows pedigree convention: unknown=0 | null, male=1, female=2)
inferred_karyotype (string, eg: XX | XY | other karyotypes)

If you import a pedigree, the sex value is written to the reported_sex attribute.

Local develompent of SM

The recommended way to develop the sample-metadata system is to run a local copy of SM.

There have been some reported issues of running a local SM environment on an M1 mac.

You can run MariaDB with a locally installed docker, or from within a docker container. You can configure the MariaDB connection with environment variables.

Creating the environment

Dependencies for the sample-metadata API package are listed in setup.py. Additional dev requirements are listed in requirements-dev.txt, and packages for the sever-side code are listed in requirements.txt.

To create the full dev environment, run:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install --editable .

Default DB set-up

These are the default values for the SM database connection. Please alter them if you use any different values when setting up the database.

export SM_DEV_DB_USER=root
export SM_DEV_DB_PASSWORD= # empty password
export SM_DEV_DB_HOST=127.0.0.1
export SM_DEV_DB_PORT=3306 # default mariadb port

Create the database in MariaDB (by default, we call it sm_dev): If you use a different databse name also set the following

export SM_DEV_DB_NAME=sm_database_name

Sample-metadata stores all metadata in one database (previously: one database per project).

mysql -u root --execute 'CREATE DATABASE sm_dev'

Download the mariadb-java-client and create the schema using liquibase:

pushd db/
wget https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.0.3/mariadb-java-client-3.0.3.jar
liquibase \
    --changeLogFile project.xml \
    --url jdbc:mariadb://localhost/sm_dev \
    --driver org.mariadb.jdbc.Driver \
    --classpath mariadb-java-client-3.0.3.jar \
    --username root \
    update
popd

Using Maria DB docker image

Pull mariadb image

docker pull mariadb

Run a mariadb container that will server your database. -p 3307:3306 remaps the port to 3307 in case if you local MySQL is already using 3306

docker stop mysql-p3307  # stop and remove if the container already exists
docker rm mysql-p3307
# run with an empty root password
docker run -p 3307:3306 --name mysql-p3307 -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mariadb

mysql --host=127.0.0.1 --port=3307 -u root -e 'CREATE DATABASE sm_dev;'
mysql --host=127.0.0.1 --port=3307 -u root -e 'show databases;'

Go into the db/ subdirectory, download the mariadb-java-client and create the schema using liquibase:

pushd db/
wget https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.0.3/mariadb-java-client-3.0.3.jar
liquibase \
    --changeLogFile project.xml \
    --url jdbc:mariadb://127.0.0.1:3307/sm_dev \
    --driver org.mariadb.jdbc.Driver \
    --classpath mariadb-java-client-3.0.3.jar \
    --username root \
    update
popd

Finally, make sure you configure the server (making use of the environment variables) to point it to your local Maria DB server

export SM_DEV_DB_PORT=3307

Running the server

You'll want to set the following environment variables (permanently) in your local development environment.

# ensures the SWAGGER page (localhost:8000/docs) points to your local environment
export SM_ENVIRONMENT=LOCAL
# skips permission checks in your local environment
export SM_ALLOWALLACCESS=true

# start the server
python3 -m api.server
# OR
# uvicorn --port 8000 --host 0.0.0.0 api.server:app

In a different terminal, execute the following request to create a new project called 'dev'

curl -X 'PUT' \
  'http://localhost:8000/api/v1/project/?name=dev&dataset=dev&gcp_id=dev&create_test_project=false' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)"

Quickstart: Generate and install the installable API

It's best to do this with an already running server:

python3 regenerate_api.py \
    && pip install .

Debugging the server in VSCode

VSCode allows you to debug python modules, we could debug the web API at api/server.py by considering the following launch.json:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "API server",
      "type": "python",
      "request": "launch",
      "module": "api.server"
    }
  ]
}

We could now place breakpoints on the sample route (ie: api/routes/sample.py), and debug requests as they come in.

Developing the UI

# Ensure you have started sm locally on your computer already, then in another tab open the UI.
# This will automatically proxy request to the server.
cd web
npm install
npm start

Unauthenticated access

You'll want to set the SM_LOCALONLY_DEFAULTUSER environment variable along with ALLOWALLACCESS to allow access to a local sample-metadata server without providing a bearer token. This will allow you to test the front-end components that access data. This happens automatically on the production instance through the Google identity-aware-proxy.

export SM_ALLOWALLACCESS=1
export SM_LOCALONLY_DEFAULTUSER=$(whoami)

OpenAPI and Swagger

The Web API uses apispec with OpenAPI3 annotations on each route to describe interactions with the server. We can generate a swagger UI and an installable python module based on these annotations.

Some handy links:

The web API exposes this schema in two ways:

Swagger UI: http://localhost:8000/docs
- You can use this to construct requests to the server
- Make sure you fill in the Bearer token (at the top right )
OpenAPI schema: http://localhost:8000/schema.json
- Returns a JSON with the full OpenAPI 3 compliant schema.
- You could put this into the Swagger editor to see the same "Swagger UI" that /api/docs exposes.
- We generate the sample_metadata installable Python API based on this schema.

Generating the installable API

The installable API is automatically generated through the package.yml GitHub action and uploaded to PyPI.

To generate the python api you'll need to install openapi generator v5.x.x

To install a specific version of the openapi-generator dow the following:

npm install @openapitools/openapi-generator-cli -g
openapi-generator-cli version-manager set 5.3.0

Then set your environment variable OPENAPI_COMMAND to the following. You can also add an alias to your ~/.bash_profile or equivalent for running in the terminal.

export OPENAPI_COMMAND="npx @openapitools/openapi-generator-cli"
alias openapi-generator="npx @openapitools/openapi-generator-cli"

You could generate the installable API and install it with pip by running:

# this will start the api.server, so make sure you have the dependencies installed,
python regenerate_api.py \
    && pip install .

Or you can build the docker file, and specify that

# SM_DOCKER is a known env variable to regenerate_api.py
export SM_DOCKER="cpg/sample-metadata-server:dev"
docker build --build-arg SM_ENVIRONMENT=local -t $SM_DOCKER -f deploy/api/Dockerfile .
python regenerate_apy.py

Deployment

The sample-metadata server

You'll want to complete the following steps:

Ensure there is a database created for each project (with the database name being the project),
Ensure there are secrets in projects/sample_metadata/secrets/databases/versions/latest, that's an array of objects with keys dbname, host, port, username, password.
Ensure google-cloud was installed

export SM_ENVIRONMENT='PRODUCTION'

# OR, point to the dev instance with
export SM_ENVIRONMENT='DEVELOPMENT'

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language
- Python
Topic
- Scientific/Engineering
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.7.1

Nov 9, 2022

0.7.0

Sep 23, 2022

0.6.8

Aug 5, 2022

0.6.7

Jul 6, 2022

0.6.5

Jun 17, 2022

0.6.4

Jun 3, 2022

0.6.3

May 25, 2022

0.6.2

May 25, 2022

0.6.1

May 25, 2022

0.6.0

May 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sample-metadata-ms-0.7.1.tar.gz (108.0 kB view hashes)

Uploaded Nov 9, 2022 Source

Hashes for sample-metadata-ms-0.7.1.tar.gz

Hashes for sample-metadata-ms-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`5d7d4ab4949da8a5d04d757da6e605960fa7d78b8f34a19dcdb6b756a3d6f59e`
MD5	`d8a7b238c3a36544b2b388ce460e0598`
BLAKE2b-256	`9aa411cef6f2476a5f239b0182874c0b91c254415a2af3927812f2efe60f8ea6`