Skip to main content

Wrapper that connect flask-taxonomies with Invenio

Project description

oarepo-taxonomies

Wrapper that connect Flask-Taxonomies with Invenio.

image image image image image

Installation

The package is installed classically via the PyPi repository:

pip install oarepo-taxonomies

The database is initialized using the management task:

invenio taxonomies init

Config

Search serializer

Taxonomic facets can only be used if a wrapper (taxonomy_enabled_search) is used to wrap your search serializer in RECORDS_REST_ENDPOINTS. Wrapper takes two positional argument. First is search_serializer and second is enabled taxonomy. Now you can use taxonomy_term_facet.

RECORDS_REST_ENDPOINTS = {
...
'search_serializers': {
            'application/json': taxonomy_enabled_search(json_search, taxonomy_aggs=["degreeGrantor"], 
                                                        fallback_language),
        },
...
}

Usage

All functionality is provided by flask-taxonomies. For more details see: flask-taxonomies.

In addition, this package adds the ability to import and export taxonomies using Excel files (* .xlsx) and can dereference a reference to a taxonomy in an invenio record.

Import from Excel

Importing from Excel is handled by the management task:

invenio taxonomies import [OPTIONS] TAXONOMY_FILE

Options:
--int TEXT
--str TEXT
--bool TEXT
--drop / --no-drop
--help

where:

  • TAXONOMY FILE is path to the xlsx file (older xls file is not supported)
  • --int, --str, --bool options are repeatable options and determine data type
  • --drop/--no-drop Specifies whether the old taxonomy should be removed from the database when we import a taxonomy with the same taxonomy code.

Structure of Excel file

Blocks

Excel must contain two blocks. The first block contains taxonomy information and must contain one mandatory code column (taxonomy identifier). Indeed, it can contain other user data (eg. title or description).

The second block must be separated from the first by a blank line and must contain two mandatory columns, level and slug, in exactly that order. The other columns are optional.

Nested JSON
Taxonomies are internally represented as JSON, which can be nested. Excel spreadsheet is inherently linear and can not store nested data. However, oarepo-taxonomies support nested JSON. Each value in a nested JSON has its own unique address. Each JSON level is separated by an underscore, so each branched JSON can be transformed to linear as follows.

Nested:

{
    "a": 1,
    "b": 2,
    "c": [{"d": [2, 3, 4], "e": [{"f": 1, "g": 2}]}]
}

Linear:

{"a": 1,
 "b": 2,
 "c_0_d_0": 2,
 "c_0_d_1": 3,
 "c_0_d_2": 4,
 "c_0_e_0_f": 1,
 "c_0_e_0_g": 2
}

According to the same pattern, headings can be created in Excel and the data is transformed into a nested form.

Level order

Taxonomies are tree structures that are also not linear and cannot be transferred to an Excel spreadsheet environment. Therefore, the sort order goes from root to the lowest child. Root (Taxonomy) -> level 1 first child - ... last level all children, level 1 second offspring ... etc.

Excel example

code title_cs title_en
cities Města Cities
level slug title_cs title_en
1 eu Evropa Europe
2 cz Česko Czechia
3 prg Praha Prague
3 brn Brno Brno
2 de Německo Germany
3 ber Berlín Berlin
3 mun Mnichov Munich
2 gb Velká Británie United Kingdom
3 lon Londýn London
3 man Manchester Manchester

The resulting json for the taxonomy will take the following form:

{
  "code": "cities",
  "title": {
    "cs": "Města",
    "en": "Citites"
  }
}

and for individual Taxonomy Term:

{
  "code": "Praha",
  "title": {
    "cs": "Praha",
    "en": "Prague"
  }
}

and tree structure:

cities  
└-eu  
  |--cz  
  |  |--prg  
  |  └--brn  
  |--de  
  |  |--ber   
  |  └--mun  
  └--gb  
     |--lon   
     └--man      

Export to Excel

Excel export is created using a management task invenio taxonomies export TAXONOMY_CODE.

An xlsx and csv file is created in the current folder where the task was run.

Marshmallow

The Marshmallow module serialize Taxonomy and dereference reference from links/self. The module provides the Marshmallow field TaxonomyField and schema TaxonomySchema, which can be freely used in the user schema. TaxonomyField/Schema receives any user data and checks if the user data is JSON/dict, string or list.

The output format of serialized taxonomies is the Taxonomic List, which contains ancestors in addition to the taxonomy itself. The order of taxonomy is from the parent term to the finite element of the taxonomy. For taxonomy reason, the serialization is opinionated. Example of taxonomy serialization is following:

[{
        'is_ancestor': true,
        'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
        'test': 'extra_data'
    },
        {
            'created_at': '2014-08-11T05:26:03.869245',
            'email': 'ken@yahoo.com',
            'is_ancestor': false,
            'links': {
                'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
                'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
            },
            'name': 'Ken',
            'test': 'extra_data'
        }]

Taxonomy representation can be changed in config file (e.g.: invenio.cfg). For more details please see Flask-Taxonomies.

This library use predefinded config that is located in config.py:

FLASK_TAXONOMIES_REPRESENTATION = {
    "taxonomy": {
        'include': [INCLUDE_DATA, INCLUDE_ANCESTORS, INCLUDE_URL, INCLUDE_SELF,
                    INCLUDE_ANCESTOR_LIST, INCLUDE_ANCESTOR_TAG, INCLUDE_PARENT],
        'exclude': [],
        'select': None,
        'options': {}
    }
}

There are two ways to use TaxonomyField.

  1. The input format is a dictionary or text string containing a link to the taxonomy.
    • dictionary: The dictionary must contain the nested dictionary with name links, which contains self.
    • string: Any text that contains a url to the taxonomy.
  2. The input format is list of ancestors, where last is the referenced taxonomy.
  • dictionary
from marshmallow import Schema

from oarepo_taxonomies.marshmallow import TaxonomyField

# custom schema
class TestSchema(Schema):
    field = TaxonomyField()

# taxonomy dict
random_user_taxonomy = {
    "created_at": "2014-08-11T05:26:03.869245",
    "email": "ken@yahoo.com",
    "name": "Ken",
    "links": {
        "self": "http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b"
    }
}

# record dict
data = {
    "field": random_user_taxonomy
}

schema = TestSchema()
result = schema.load(data)
assert result == {
    'field': [{
        'is_ancestor': True,
        'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
        'test': 'extra_data'
    },
        {
            'created_at': '2014-08-11T05:26:03.869245',
            'email': 'ken@yahoo.com',
            'is_ancestor': False,
            'links': {
                'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
                'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
            },
            'name': 'Ken',
            'test': 'extra_data'
        }]
}
  • string
from marshmallow import Schema

from oarepo_taxonomies.marshmallow import TaxonomyField

# custom schema
class TestSchema(Schema):
    field = TaxonomyField()

# taxonomy reference as any string with url
random_user_taxonomy = "bla bla http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b"

# record dict
data = {
    "field": random_user_taxonomy
}

schema = TestSchema()
result = schema.load(data)
assert result == {
    'field': [{
                  'is_ancestor': True,
                  'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
                  'test': 'extra_data'
              },
              {
                  'is_ancestor': False,
                  'links': {
                      'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
                      'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
                  },
                  'test': 'extra_data'
              }]
}
  • list
from marshmallow import Schema

from oarepo_taxonomies.marshmallow import TaxonomyField

# custom schema
class TestSchema(Schema):
    field = TaxonomyField()

# taxonomy list with ancestor (root ancestor at the first place)
random_user_taxonomy = [
    {
        'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
    },
    {
        'links': {
            'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
        },
        'test': 'extra_data',
        'next': 'bla',
        'another': 'something'
    }
]

# record dict
data = {
    "field": random_user_taxonomy
}

schema = TestSchema()
result = schema.load(data)
assert result == {
    'field': [{
        'is_ancestor': True,
        'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
        'test': 'extra_data'
    },
        {
            'another': 'something',
            'is_ancestor': False,
            'links': {
                'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
                'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
            },
            'next': 'bla',
            'test': 'extra_data'
        }]
}

TaxonomyField vs. TaxonomySchema

TaxonomySchema is a marshmallow schema, that can be subclassed and used, for example, inside Nested.

TaxonomyField is a marshmallow Field that is used as is. The field also allows extending taxonomy metadata model with extra properties.

Signature of the factory is following TaxonomyField(*args, extra=None, name=None, many=False, mixins: list = None, **kwargs)

  • args: arbitrary arguments passed to marshmallow.schema
  • extra: a dictionary of extra marshmallow fields (key: field name, value: instance of Field)
  • name: optional name of the field (it is used as a name of the dynamically created class on the background)
  • mixins: list of added mixins (class defining extra marshmallow Fields)
  • kwargs: arbitrary named arguments passed to the generated marshmallow schemas
class InstitutionMixin:
    name = SanitizedUnicode()
    address = SanitizedUnicode()

class TestSchema(Schema):
    field = TaxonomyField(many=True, mixins=[InstitutionMixin])

random_user_taxonomy = [
        {
            'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
        },
        {
            'links': {
                'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
            },
            'test': 'extra_data',
            'next': 'bla',
            'another': 'something',
            'name': 'Hogwarts',
            'address': 'Platform nine and three-quarters'
        }
    ]

data = {
    "field": random_user_taxonomy
}

schema = TestSchema()
result = schema.load(data)
assert result == {
    'field': [{
                  'is_ancestor': True,
                  'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
                  'test': 'extra_data'
              },
              {
                  'address': 'Platform nine and three-quarters',
                  'another': 'something',
                  'is_ancestor': False,
                  'links': {
                      'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
                      'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
                  },
                  'name': 'Hogwarts',
                  'next': 'bla',
                  'test': 'extra_data'
              }]
}

JSONSchemas

The library offers a predefined JSON schema for taxonomies. The predefined schema is called with "$ref": "../taxonomy-v2.0.0.json#/definitions/TaxonomyTerm" and is available in Invenio in current_jsonschemas.list_schemas().

Custom schema example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "id": "https://example.com/schemas/example_json-v1.0.0.json",
  "additionalProperties": false,
  "title": "My site v1.0.0",
  "type": "object",
  "properties": {
    "$schema": {
      "type": "string"
    },
    "custom_taxonomy": {
      "$ref": "../taxonomy-v2.0.0.json#/definitions/TaxonomyTerm"
    }
  }
}

Elasticsearch mapping

Predefined mappings can be used for indexing into Elasticsearch. If you want to use this mapping you must use the library OAREPO mapping includes. A reference to taxonomy mapping is then inserted to custom mapping as either "type": "taxonomy-v2.0.0.json#/TaxonomyTerm" or "type": "taxonomy-term".

Custom mapping example:

{
  "mappings": {
    "date_detection": false,
    "numeric_detection": false,
    "dynamic": false,
    "properties": {
      "$schema": {
        "type": "keyword",
        "index": true
      },
      "custom_taxonomy": {
        "type": "taxonomy-v2.0.0.json#/TaxonomyTerm"
      }
    }
  }
}

Signals

This module will register the following signal handlers on the Flask Taxonomies signals that handle managing of reference Taxonomies whenever a Taxonomy or TaxonomyTerm changes:

Flask-Taxonomies signals Registred signal handler Description
before_taxonomy_deleted taxonomy_delete Checks if the changed taxonomy is a reference to any record. If so, they throw an exception.
before_taxonomy_term_deleted taxonomy_term_delete Checks if the changed TaxonomyTerm is a reference to any record. If so, they throw an exception.
after_taxonomy_term_updated taxonomy_term_update Replaces the link in the records to the moved TaxonomyTerm.
after_taxonomy_term_moved taxonomy_term_moved Replaces the contents of the changed taxonomy in the referenced records.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oarepo_taxonomies-3.2.2.tar.gz (36.2 kB view hashes)

Uploaded Source

Built Distribution

oarepo_taxonomies-3.2.2-py2.py3-none-any.whl (40.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page