Skip to main content

scitrack

Project description

One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application, input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.

scitrack is a library aimed at application developers writing scientific software to support this tracking of scientific computation. The library provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.

Installing

For the released version:

$ pip install scitrack

For the very latest version:

$ pip install git+https://github.com/HuttleyLab/scitrack

Or clone it:

$ git clone git@github.com:HuttleyLab/scitrack.git

And then install:

$ pip install ~/path/to/scitrack

CachingLogger

There is a single object provided by scitrack, CachingLogger. This object is basically a wrapper around the logging module, but on invocation, captures basic information regarding the system and the command line call that was made to invoke the application.

In addition, the class provides convenience methods for logging both the path and the md5 hexdigest checksum of input/output files. A method is also provided for producing checksums of text data. The latter is useful for the case when data are from a stream or a database, for instance.

All logging calls are cached until a path for a logfile is provided. The logger can also, optionally, create directories.

When run in parallel using mpirun, the process ID is appended to the hostname to help identify processors.

Simple instantiation of the logger

Creating the logger. Setting create_dir=True means on creation of the logfile, the directory path will be created also.

from scitrack import CachingLogger
LOGGER = CachingLogger(create_dir=True)
LOGGER.log_file_path = "somedir/some_path.log"

The last assignment triggers creation of somedir/some_path.log.

Capturing a programs arguments and options

scitrack will write the contents of sys.argv to the log file, prefixed by command_string. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is your responsibility as a developer.

Here’s one approach when using the click command line interface library. Below we create a simple click app and capture the required and optional argument values.

from scitrack import CachingLogger
import click

LOGGER = CachingLogger()

@click.group()
def main():
    """the main command"""
    pass

@main.command()
@click.option('-i', '--infile', type=click.Path(exists=True))
@click.option('-t', '--test', is_flag=True, help='Run test.')
def my_app(infile, test):
    # capture the local variables, at this point just provided arguments
    LOGGER.log_args()
    LOGGER.log_versions('numpy')
    LOGGER.input_file(infile)
    LOGGER.log_file_path = "some_path.log"

if __name__ == "__main__":
    my_app()

The CachingLogger.write() method takes a message and a label. All other logging methods wrap log_message(), providing a specific label. For instance, the method input_file() writes out two lines in the log.

  • input_file_path, the absolute path to the intput file

  • input_file_path md5sum, the hex digest of the file

output_file() behaves analogously. An additional method text_data() is useful for other data input/output sources (e.g. records from a database). For this to have value for arbitrary data types requires a systematic approach to ensuring the text conversion is robust across platforms.

The log_args() method captures all local variables within a scope.

The log_versions() method captures versions for the current file and that of a list of named packages, e.g. LOGGER.log_versions(['numpy', 'sklearn']).

Some sample output

2018-11-28 11:33:30 yourmachine.com:71779   INFO    system_details : system=Darwin Kernel Version 18.2.0: Fri Oct  5 19:41:49 PDT 2018; root:xnu-4903.221.2~2/RELEASE_X86_64
2018-11-28 11:33:30 yourmachine.com:71779   INFO    python : 3.7.1
2018-11-28 11:33:30 yourmachine.com:71779   INFO    user : gavin
2018-11-28 11:33:30 yourmachine.com:71779   INFO    command_string : /Users/gavin/miniconda3/envs/py37/bin/py.test -s
2018-11-28 11:33:30 yourmachine.com:71779   INFO    input_file_path : /Users/gavin/repos/SciTrack/tests/sample.fasta
2018-11-28 11:33:30 yourmachine.com:71779   INFO    input_file_path md5sum : 96eb2c2632bae19eb65ea9224aaafdad
2018-11-28 11:33:30 yourmachine.com:71779   INFO    version : test_logging==0.1.5
2018-11-28 11:33:30 yourmachine.com:71779   INFO    version : numpy==1.15.1

Other useful functions

Two other useful functions are get_file_hexdigest and get_text_hexdigest. The latter can take either unicode or ascii strings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scitrack-0.1.8.1-py2.py3-none-any.whl (6.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page