JSON schema and validation code for HEPData submissions
Project description
JSON schema and validation code for HEPData submissions
Documentation: http://hepdata-validator.readthedocs.io
Installation
If you can, install LibYAML (a C library for parsing and emitting YAML) on your machine. This will allow for the use of CLoader for faster loading of YAML files. Not a big deal for small files, but performs markedly better on larger documents.
Via pip:
pip install hepdata-validator
Via GitHub (for developers):
git clone https://github.com/HEPData/hepdata-validator
cd hepdata-validator
pip install --upgrade -e .[tests]
pytest testsuite
Usage
To validate submission files, instantiate a SubmissionFileValidator object:
from hepdata_validator.submission_file_validator import SubmissionFileValidator
submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'
# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)
# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()
# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)
To validate data files, instantiate a DataFileValidator object:
from hepdata_validator.data_file_validator import DataFileValidator
data_file_validator = DataFileValidator()
# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')
# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()
# the error messages can be printed
data_file_validator.print_errors('data.yaml')
Optionally, if you have already loaded the YAML object, then you can pass it through as a data object. You must also pass through the file_path since this is used as a key for the error message lookup map.
from hepdata_validator.data_file_validator import DataFileValidator
import yaml
file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()
data_file_validator.validate(file_path='data.yaml', data=file_contents)
data_file_validator.get_messages('data.yaml')
data_file_validator.print_errors('data.yaml')
For the analogous case of the SubmissionFileValidator:
from hepdata_validator.submission_file_validator import SubmissionFileValidator
import yaml
submission_file_path = 'submission.yaml'
# convert a generator returned by yaml.safe_load_all into a list
docs = list(yaml.safe_load_all(open(submission_file_path, 'r')))
submission_file_validator = SubmissionFileValidator()
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path, data=docs)
submission_file_validator.print_errors(submission_file_path)
An example offline validation script uses the hepdata_validator package to validate the submission.yaml file and all YAML data files of a HEPData submission.
Schema Versions
When considering native HEPData JSON schemas, there are multiple versions. In most cases you should use the latest version (the default). If you need to use a different version, you can pass a keyword argument schema_version when initialising the validator:
submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')
Remote Schemas
When using remotely defined schemas, versions depend on the organization providing those schemas, and it is their responsibility to offer a way of keeping track of different schema versions.
The JsonSchemaResolver object resolves $ref in the JSON schema. The HTTPSchemaDownloader object retrieves schemas from a remote location, and optionally saves them in the local file system, following the structure: schemas_remote/<org>/<project>/<version>/<schema_name>. An example may be:
from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()
# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'
# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)
# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)
# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)
# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)
# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)
The native HEPData JSON schema are provided as part of the hepdata-validator package and it is not necessary to download them. However, in principle, for testing purposes, note that the same mechanism above could be used with:
schema_path = 'https://hepdata.net/submission/schemas/1.0.1/'
schema_name = 'data_schema.json'
and passing a HEPData YAML data file as the file_path argument of the validate method.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hepdata_validator-0.2.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7e39e92f68536d6749319eb3cec5aec34338237fc01e0fdbf92ea17f87113cf |
|
MD5 | 4f99e296b1520f6dc69db0514fd31149 |
|
BLAKE2b-256 | c08a33c69f10e8def5dcfed3ed4e7f1a912cefba14a688241a1ed4a860fea274 |