triton-model-navigator

Triton Model Navigator: An inference toolkit for optimizing and deploying machine learning models and pipelines on the Triton Inference Server and PyTriton.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Welcome to the Triton Model Navigator, an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs. The Triton Model Navigator streamlines the process of moving models and pipelines implemented in PyTorch, TensorFlow, and ONNX to TensorRT.

The Triton Model Navigator automates several critical steps, including model export, conversion, correctness testing, and profiling. By providing a single entry point for various supported frameworks, users can efficiently search for the best deployment option using the per-framework optimize function. The resulting optimized models are ready for deployment on either PyTriton or Triton Inference Server.

Features at Glance

The distinct capabilities of the Triton Model Navigator are summarized in the feature matrix:

Feature	Description
Ease-of-use	Single line of code to run all possible optimization paths directly from your source code
Wide Framework Support	Compatible with various machine learning frameworks including PyTorch, TensorFlow, and ONNX
Models Optimization	Enhance the performance of models such as ResNET and BERT for efficient inference deployment
Pipelines Optimization	Streamline Python code pipelines for models such as Stable Diffusion and Whisper using Inplace Optimization, exclusive to PyTorch
Model Export and Conversion	Automate the process of exporting and converting models between various formats with focus on TensorRT and Torch-TensorRT
Correctness Testing	Ensures the converted model produce correct outputs validating against the original model
Performance Profiling	Profiles models to select the optimal format based on performance metrics such as latency and throughput to optimize target hardware utilization
Models Deployment	Automates models and pipelines deployment on PyTriton and Triton Inference Server through dedicated API

Documentation

Learn more about the Triton Model Navigator features in documentation.

Prerequisites

Before proceeding with the installation of the Triton Model Navigator, ensure your system meets the following criteria:

Operating System: Linux (Ubuntu 20.04+ recommended)
Python: Version 3.8 or newer
NVIDIA GPU

You can use NGC Containers for PyTorch and TensorFlow which contain all necessary dependencies:

Install

The Triton Model Navigator can be installed from pypi.org by running the following command:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[<extras,>]

Installing with PyTorch extras:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[torch]

Installing with TensorFlow extras:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[tensorflow]

Optimize Stable Diffusion with Inplace

The Inplace Optimize allows seamless optimization of models for deployment, such as converting them to TensorRT, without requiring any changes to the original Python pipelines.

For the Stable Diffusion model, initialize the pipeline and wrap the model components with nav.Module:

import model_navigator as nav
from transformers.modeling_outputs import BaseModelOutputWithPooling
from diffusers import DPMSolverMultistepScheduler, StableDiffusionPipeline


def get_pipeline():
    # Initialize Stable Diffusion pipeline and wrap modules for optimization
    pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe = pipe.to("cuda")

    pipe.text_encoder = nav.Module(
        pipe.text_encoder,
        name="clip",
        output_mapping=lambda output: BaseModelOutputWithPooling(**output),
    )
    pipe.unet = nav.Module(
        pipe.unet,
        name="unet",
    )
    pipe.vae.decoder = nav.Module(
        pipe.vae.decoder,
        name="vae",
    )

    return pipe

Prepare a simple dataloader:

def get_dataloader():
    # Please mind, the first element in tuple need to be a batch size
    return [(1, "a photo of an astronaut riding a horse on mars")]

Execute model optimization:

pipe = get_pipeline()
dataloader = get_dataloader()

nav.optimize(pipe, dataloader)

Once the pipeline has been optimized, you can load explicit the most performant version of the modules executing:

nav.load_optimized()

After executing this method, when the optimized version of module exists, it will be used in your pipeline execution directly in Python. The example how to serve Stable Diffusion pipeline through PyTriton can be found here.

Optimize ResNET and deploy on Triton

The Triton Model Navigator also supports an optimization path for deployment on Triton. This path is supported for nn.Module, keras.Model or ONNX files which inputs are tensors.

To optimize ResNet50 model from TorchHub run the following code:

import torch
import model_navigator as nav

# Optimize Torch model loaded from TorchHub
package = nav.torch.optimize(
    model=torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True).eval(),
    dataloader=[torch.randn(1, 3, 256, 256) for _ in range(10)],
)

Once optimization is done, creating a model store for deployment on Triton is simple as following code:

import pathlib

# Generate the model store from optimized model
nav.triton.model_repository.add_model_from_package(
    model_repository_path=pathlib.Path("model_repository"),
    model_name="resnet50",
    package=package,
    strategy=nav.MaxThroughputStrategy(),
)

Profile any model or callable in Python

The Triton Model Navigator enhances models and pipelines and provides a uniform method for profiling any Python function, callable, or model. At present, our support is limited strictly to static batch profiling scenarios.

As an example, we will use a simple function that simply sleeps for 50 ms:

import time


def custom_fn(input_):
    # wait 50ms
    time.sleep(0.05)
    return input_

Let’s provide a dataloader we will use for profiling:

# Tuple of batch size and data sample
dataloader = [(1, ["This is example input"])]

Finally, run the profiling of the function with prepared dataloader:

nav.profile(custom_fn, dataloader)

Examples

We offer comprehensive, step-by-step guides that showcase the utilization of the Triton Model Navigator’s diverse features. These guides are designed to elucidate the processes of optimization, profiling, testing, and deployment of models using PyTriton and Triton Inference Server.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.8.1

Apr 4, 2024

0.8.0

Mar 22, 2024

0.7.7

Mar 15, 2024

0.7.5

Dec 20, 2023

0.7.4

Nov 8, 2023

0.7.3

Sep 27, 2023

0.7.2

Aug 30, 2023

0.7.1

Aug 21, 2023

0.7.0

Aug 11, 2023

0.6.3

Jul 25, 2023

0.6.2

Jul 19, 2023

0.6.1

Jul 7, 2023

0.6.0

Jun 30, 2023

0.5.6

Jun 27, 2023

0.5.5

May 26, 2023

0.5.4

May 18, 2023

0.5.3

Apr 19, 2023

0.5.2

Apr 12, 2023

0.5.1

Mar 30, 2023

0.5.0

Mar 30, 2023

0.4.4

Mar 30, 2023

0.4.3

Mar 30, 2023

0.4.2

Mar 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

triton_model_navigator-0.8.1-py3-none-any.whl (317.9 kB view hashes)

Uploaded Apr 4, 2024 Python 3

Hashes for triton_model_navigator-0.8.1-py3-none-any.whl

Hashes for triton_model_navigator-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5514b5e218c67107cb438f32f4253b777789d35bd91afb55055c366c3d13318a`
MD5	`b965fecf729161feb103c8e147ef6309`
BLAKE2b-256	`45298ff2afc7a2649ddca57998fba8ecf9dafafef01535de9c497b03eaa62cf0`