Skip to main content

Graphsignal Profiler

Project description

Graphsignal: Inference Profiler

License Version Status

Graphsignal is a machine learning inference profiler. It helps data scientists and ML engineers make model inference faster and more efficient. It is built for real-world use cases and allows ML practitioners to:

  • Optimize inference by benchmarking latency and throughput, analyzing execution trace, operation-level statistics and compute utilization.
  • Start profiling scripts and notebooks automatically by adding a few lines of code.
  • Use the profiler in local, remote or cloud environment without installing any additional software or opening inbound ports.
  • Keep data private; no code or data is sent to Graphsignal cloud, only run statistics and metadata.

Dashboards

Learn more at graphsignal.com.

Documentation

See full documentation at graphsignal.com/docs.

Getting Started

1. Installation

Install the profiler by running:

pip install graphsignal

Or clone and install the GitHub repository:

git clone https://github.com/graphsignal/graphsignal.git
python setup.py install

2. Configuration

Configure the profiler by specifying your API key and workload name directly or via environment variables.

import graphsignal

graphsignal.configure(api_key='my_api_key', workload_name='job1')

To get an API key, sign up for a free account at graphsignal.com. The key can then be found in your account's Settings / API Keys page.

workload_name identifies the job, application or service that is being profiled.

One workload can be run multiple times, e.g. to benchmark different parameters. To tag each run, use graphsignal.add_tag('mytag').

In case of multiple subsequent runs/experiments executed within a single script or notebook, call graphsignal.end_run() to end current run, upload it and initialize a new one.

Graphsignal has a built-in support for distributed inference. See Distributed Workloads section for more information.

3. Profiling

Use the following minimal examples to integrate Graphsignal into your machine learning script. See integration documentation and profiling API reference for full reference.

When profile_inference method is used repeatedly, all inferences will be measured, but only a few will be profiled to ensure low overhead.

TensorFlow

from graphsignal.profilers.tensorflow import profile_inference

with profile_inference():
    # single or batch prediction

Keras

from graphsignal.profilers.keras import GraphsignalCallback

model.predict(..., callbacks=[GraphsignalCallback()])
# or model.evaluate(..., callbacks=[GraphsignalCallback()])

PyTorch

from graphsignal.profilers.pytorch import profile_inference

with profile_inference():
    # single or batch prediction

PyTorch Lightning

from graphsignal.profilers.pytorch_lightning import GraphsignalCallback

trainer = Trainer(..., callbacks=[GraphsignalCallback()])
trainer.predict() # or trainer.validate() or trainer.test()

Hugging Face

from transformers import pipeline
from graphsignal.profilers.pytorch import profile_inference
# or from graphsignal.profilers.tensorflow import profile_inference

generator = pipeline(task="text-generation")

with profile_inference():
    output = generator('some text')

JAX

from graphsignal.profilers.jax import profile_inference

with profile_inference():
    # single or batch prediction

ONNX Runtime

import onnxruntime
from graphsignal.profilers.onnxruntime import initialize_profiler, profile_inference

sess_options = onnxruntime.SessionOptions()
initialize_profiler(sess_options)

session = onnxruntime.InferenceSession('my_model_path', sess_options)
with profile_inference(session):
    session.run(...)

Other frameworks

from graphsignal.profilers.generic import profile_inference

with profile_inference():
    # single or batch prediction

4. Logging

Logging parameters and metrics enables benchmarking inference latency and throughput against logged values. For example, logging evaluation accuracy in optimization runs is useful for ensuring that the accuracy is not affected by inference optimizations or to identify the best tradeoff.

graphsignal.log_param('my_param', 'val')
graphsignal.log_metric('my_metric', 0.9)

Parameters and metrics can also be passed via environment variables. See profiling API reference for full documentation.

5. Dashboards

After profiling is setup, open Graphsignal to analyze recorded profiles.

Example

# 1. Import Graphsignal modules
import graphsignal
from graphsignal.profilers.pytorch import profile_inference

# 2. Configure
graphsignal.configure(api_key='my_key', workload_name='my_gpu_inference')

....

# 3. Use profile method to measure and profile single or batch predictions
for x in data:
    with profile_inference():
        preds = model(x)

More integration examples are available in examples repo.

Overhead

Although profiling may add some overhead to applications, Graphsignal Profiler only profiles certain inferences, automatically limiting the overhead.

Security and Privacy

Graphsignal Profiler can only open outbound connections to profile-api.graphsignal.com and send data, no inbound connections or commands are possible.

No code or data is sent to Graphsignal cloud, only run statistics and metadata.

Troubleshooting

To enable debug logging, add debug_mode=True to configure(). If the debug log doesn't give you any hints on how to fix a problem, please report it to our support team via your account.

In case of connection issues, please make sure outgoing connections to https://profile-api.graphsignal.com are allowed.

For GPU profiling, if libcupti library is failing to load, make sure the NVIDIA® CUDA® Profiling Tools Interface (CUPTI) is installed by running:

/sbin/ldconfig -p | grep libcupti

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphsignal-0.9.4.tar.gz (56.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page