Skip to main content

Extend dtool-lookup-server with ability to filter by annotations

Project description

PyPi package Travis CI build status (Linux) Code Coverage

Introduction

This dtool-lookup-server plugin adds the ability to get an overview of the dataset a user has got access to based on how those datasets have been annotated with key/value pairs.

The purpose of this API is to give users an overview of all the datasets available to them and to allow them to drill down on those results by filtering based upon keys and key/value pairs.

This API could be used to build a webapp that allows users to get an “eagle-eye” view of their data.

Installation

This plugin depends on having installed and configured a dtool-lookup-server. This plugin can then be installed by running the commands below.

git clone https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin.git
cd dtool-lookup-server-annotation-filter-plugin
python setup.py install

See dtool-lookup-server for more information about the setup of the base system.

Routes

This plugin has five routes.

  • POST /annotation_filter_plugin/annotation_keys

  • POST /annotation_filter_plugin/annotation_values

  • POST /annotation_filter_plugin/num_datasets

  • POST /annotation_filter_plugin/datasets

  • GET /annotation_filter_plugin/version

The first gives access to all annotations keys that have are present on at least one dataset with a basic value. The keys will only be extracted from datasets that pass any annotation filter in the post request. The response from this route includes information about the number of datasets associated with each key.

The second gives access to all values for the keys specified in the post request. The values will only be extracted from the datasets that pass the annotation filter in the post request. The response form this route includes information about the number of datasets associated with each key/value pair.

The third gives the number of datasets given a particular annotation filter.

The fourth gives the list of datasets given a particular annotation filter.

The fifth returns the version of the plugin.

Filter syntax

Below are examples of JSON queries that can be posted to the routes.

No filters, i.e. get all (this only really makes sense for the /annotation_filter_plugin/annotation_keys route).

{}

Get only datasets that have the key “color”:

{
    "annotation_keys": ["color"]
}

Get only datasets that have the “color” is set to “red”:

{
    "annotations": {"color": "red"}
}

Get only datasets that have both the keys “color” and “pattern”:

{
    "annotation_keys": ["color", "pattern"]
}

Get only datasets that have the “color” is set to “red” and “pattern” set to “stripey”:

{
    "annotations": {"color": "red", "pattern": "stripey"}
}

Get only datasets that have the keys “color” and “pattern” and where the “color” is set to “red”:

{
    "annotation_keys": ["color", "pattern"],
    "annotations": {"color": "red"}
}

Limitations

  • This plugin only recognises annotations where the value is a basic type, such as a string, a number or a boolean value. In other words a dataset’s annotations where the value is a data structures such as lists and dictionaries will be ignored.

  • Datasets that do not have any annotation with a basic type as a value will not be recognised up by this plugin.

Usage

Preparation

The dtool lookup server makes use of the Authorization header to pass through the JSON web token for authorization. Below we create environment variables for the token and the header used in the curl commands:

$ TOKEN=$(flask user token olssont)
$ HEADER="Authorization: Bearer $TOKEN"

Find keys available for filtering and the number of datasets associated with them

The command below finds all annotations keys available for further filtering:

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below means that the annotation key “color” has 120 datasets associated with it and the annotation key “pattern” has 50 datasets associated with it.

{"color": 120, "pattern": 50, "size": 10}

Suppose that one chooses to filter further based on the “pattern” annotation key. Using the command below one could find the annotation keys that are still relevant given that each dataset has to have the annotation key “pattern”.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["pattern"]}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below shows that no datasets that remain have the key “size” and 45 of the datasets with the key “pattern” also have the key “color”.

{"color": 45, "pattern": 50}

It is possible to filter based on an annotation key/value pair. For example, to limit the datasets to the case where the “pattern” is “stripey” one could use the command below.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below shows that this is more specific and that there are fewer results.

{"color": 5, "pattern": 10}

It is possible to make more complex queries. The command below also requires that the datasets have the key “color”.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

In the response below there are now fewer datasets with the “pattern” key. That is because some of the datasets that were picked up previously did not have the “color” key.

{"color": 5, "pattern": 3}

It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs “s3://snow-white” and “s3://mr-men”:

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"base_uris": ["s3://snow-white", "s3://mr-men"]}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below shows that there are fewer hits than when all base URIs are included.

{"color": 77, "pattern": 35, "size": 4}

Find annotations available for filtering and the number of datasets associated with them

The pattern for finding annotation key/value pairs and the number of datasets assocated with them is similar to that of finding the keys (above).

The command below can be used to find all the values associated with the “color” key and the number of datasets that has been annotated with each particular value.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["color"]}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_values

The response below shows that there are five colors available and that most datasets have the color “red”.

{
    "color": {
        "red": 50,
        "pink": 30,
        "blue": 20,
        "green": 15,
        "yellow": 5
    }
}

To get data for more keys they need to be included in the filter. The command below returns the datasets that have annotations for both “color” and “pattern”.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["color", "pattern"]}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_values

The response contains less colors because some of the datasets annotated with color did not have a pattern annotation.

{
    "color": {
        "red": 15,
        "pink": 10,
        "blue": 10,
        "green": 10
    }
    "pattern": {
        "stripey": 40,
        "wavy": 10
}

It is possible to make more specific queries. The command below also requires that the datasets have the stripey pattern.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below shows that fewer datasets have been used to collect the annotation information.

{
    "color": {
        "red": 15,
        "pink": 10,
        "blue": 10,
        "green": 5
    }
    "pattern": {
        "stripey": 40,
}

It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs “s3://snow-white” and “s3://mr-men”:

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotation_keys": ["color"], "base_uris": ["s3://snow-white", "s3://mr-men"]}'  \
    http://localhost:5000/annotation_filter_plugin/annotation_keys

The response below shows that there are fewer hits than when all base URIs are included.

{
    "color": {
        "red": 50,
        "pink": 20,
        "blue": 7,
    }
}

Listing the number of datasets available for a particular filter

The number of datasets selected, using a particular filter, can be determined using the /annotation_filter_plugin/num_datasets route. The command below selects all datasets with at least one basic value (see the section below on limitations for an explanation of what a basic value is).

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{}'  \
    http://localhost:5000/annotation_filter_plugin/num_datasets

The response below shows that there are 145 such datasets.

145

The command below uses a filter to select only datasets that have the key/value pair “pattern”/”stripey”.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
    http://localhost:5000/annotation_filter_plugin/num_datasets

The response shows that there are 10 such datasets.

10

Retrieving information about datasets selected by a particular filter

It is possible to get information about the datasets selected by a particular filter using the /annotation_filter_plugin/datasets route. The command below uses a filter to select only datasets that have the key/value pair “pattern”/”stripey”.

$ curl -H "$HEADER" -H "Content-Type: application/json"  \
    -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
    http://localhost:5000/annotation_filter_plugin/datasets

Below is a truncated version of the response.

[
  {
    "annotations": {
      "pattern": "stripey
    },
    "base_uri": "s3://dtool-demo",
    "created_at": "1530803916.74",
    "creator_username": "olssont",
    "dtoolcore_version": "3.3.0",
    "frozen_at": "1536749825.85",
    "name": "hypocotyl3",
    "type": "dataset",
    "uri": "s3://dtool-demo/ba92a5fa-d3b4-4f10-bcb9-947f62e652db",
    "uuid": "ba92a5fa-d3b4-4f10-bcb9-947f62e652db"
  }
  ...
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page