Skip to main content

toolkit that assists with generating LinkML schemas from existing serializations like JSON-schema

Project description

LinkML Schema Automator

DOI

This is a toolkit that assists with:

  1. Bootstrapping LinkML models from instance data
    • TSVs and spreadsheets
    • SQLite databases
    • RDF instance graphs
  2. Bootstrapping a LinkML model from a different schema representation (i.e. opposite of a linkml.generator)
    • OWL (RDFS-like subset)
    • TODO: JSON-Schema, XSD, ShEx, SHACL, SQL DDL, FHIR, Python dataclasses/pydantic, etc
  3. Using automated methods to enhance a model
    • Using text mining and concept annotator APIs to enrich semantic enums
    • TODO: querying sparql endpoints to retrieve additional metadata

These can be composed together. For example, run tsvs2linkml followed by annotate-enums

The toolkit is still experimental. It is intended as an aid to schema creation rather than act as a formal conversion tool

Installation

linkml-model-enrichment and its components require Python 3.9 or greater.

chmod 755 environment.sh
. environment.sh
pip install -r requirements.txt
pip install -e . 

Command Line Usage

Annotating Enums

This toolkit allows automated annotation of LinkML enums, mapping text strings to ontology terms.

The command line tool annotate-enums takes a LinkML schema, with enums and fills in the meaning slots.

See the annotators folder for docs

Converting TSVs

The tsv2linkml command infers a single-class schema from a TSV datafile

$ tsv2linkml --help
Usage: tsv2linkml [OPTIONS] TSVFILE

  Infer a model from a TSV

Options:
  -o, --output TEXT        Output file
  -c, --class_name TEXT    Core class name in schema
  -n, --schema_name TEXT   Schema name
  -s, --sep TEXT           separator
  -E, --enum-columns TEXT  column that is forced to be an enum
  --robot / --no-robot     set if the TSV is a ROBOT template
  --help                   Show this message and exit.

Example:

tsv2linkml tests/resources/biobank-specimens.tsv 

The tsvs2linkml command infers a multi-class schema from multiple TSV datafiles

$ tsvs2linkml --help
Usage: tsvs2linkml [OPTIONS] [TSVFILES]...

  Infer a model from multiple TSVs

Options:
  -o, --output TEXT         Output file
  -n, --schema_name TEXT    Schema name
  -s, --sep TEXT            separator
  -E, --enum-columns TEXT   column(s) that is forced to be an enum
  --enum-mask-columns TEXT  column(s) that are excluded from being enums
  --max-enum-size INTEGER   do not create an enum if more than max distinct
                            members

  --enum-threshold FLOAT    if the number of distinct values / rows is less
                            than this, do not make an enum

  --robot / --no-robot      set if the TSV is a ROBOT template
  --help                    Show this message and exit.

Converting OWL

$ owl2linkml --help
Usage: owl2linkml [OPTIONS] OWLFILE

  Infer a model from OWL Ontology

  Note: input must be in functional syntax

Options:
  -n, --name TEXT  Schema name
  --help           Show this message and exit.

Example:

owl2linkml -n prov tests/resources/prov.ofn > prov.yaml

Note this works best on schema-style ontologies such as Prov

NOT recommended for terminological-style ontologies such as OBO

Converting RDF instance graphs

$ rdf2linkml --help
Usage: rdf2linkml [OPTIONS] RDFFILE

  Infer a model from RDF instance data

Options:
  -d, --dir TEXT  [required]
  --help          Show this message and exit.

Converting JSON Instance Data

$ jsondata2linkml --help
Usage: jsondata2linkml [OPTIONS] INPUT

  Infer a model from JSON instance data



Options:
  --container-class-name TEXT   name of root class
  -f, --format TEXT             json or yaml (or json.gz or yaml.gz)
  --omit-null / --no-omit-null  if true, ignore null values
  --help                        Show this message and exit.

Converting JSON-Schema

$ jsonschema2linkml --help
Usage: jsonschema2linkml [OPTIONS] INPUT

  Infer a model from JSON Schema

Options:
  -n, --name TEXT    ID of schema  [required]
  -f, --format TEXT  JSON Schema format - yaml or json
  -o, --output TEXT  output path
  --help             Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_automator-1.0.0b0.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

schema_automator-1.0.0b0-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page