Skip to main content

Synapse flat file validation and processing pipeline

Project description

Synapse Genie

Introduction

This package can deploy a AACR GENIE like project on Synapse and perform validation and processing of files.

Installation

Dependencies:

  • Python 3.6 or higher
  • synapseclient (pip install synapseclient)
  • Python pandas (pip install pandas)
pip install synapsegenie
synapsegenie -v

Usage

Creating your own registry

Please view the example registry to learn how to utilize synapsegenie. synapsegenie allows a user to create a registry package with a list of file formats. Each of these file format classes should extend synapsegenie.example_filetype_format.FileTypeFormat. Learn more about creating Python packages here. Once you have installed your registry package, you can now use the synapsegenie command line client.

synapsegenie Synapse project

A synapsegenie Synapse project must exist for you to fully utilize this package. There is now a command to create this infrastructure in Synapse. If you already have an existing Synapse Project you would like to use, please use the --project_id parameter, otherwise please use the --project_name parameter to create a new Synapse project.

synapsegenie bootstrap-infra --format_registry_packages example_registry \
                             --project_name "My Project Name"
                             --centers AAA BBB CCC

If you decide to add centers at a later date, you can re-run this command and the center will be added

synapsegenie bootstrap-infra --format_registry_packages example_registry \
                             --project_id syn12345
                             --centers AAA BBB CCC DDD

File Validator

The synapsegenie package also has a function to run the validator locally on all of your files. Please view the help to see how to run to validator.

synapsegenie validate-single-file -h

synapsegenie /path/to/file center_name \
             --format_registry_packages example_registry \
             --project_id syn12345 \ # Run bootstrap-infra to create a Synapse project

Validation/Processing

synapsegenie will validate and process all the files uploaded by centers. Every valid file will be processed and uploaded into Synapse tables.

synapsegenie process -h

# only validate
synapsegenie process --format_registry_packages example_registry \
                     --project_id syn12345
                     --only_validate

# validate + process
synapsegenie process --format_registry_packages example_registry \
                     --project_id syn12345

Contributing

To learn how to contribute, please read the contributing guide

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synapsegenie-0.0.2.tar.gz (36.5 kB view hashes)

Uploaded Source

Built Distribution

synapsegenie-0.0.2-py3-none-any.whl (42.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page