Skip to main content

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)

Project description

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI

[!WARNING] This is still very at a very early stage and subject to major changes.

Features

  • 🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
  • 🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using transformers.pipeline
  • 📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former google-cloud-aiplatform SDK for custom models
  • 🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to transformers
  • 🌅 Support for diffusers models too!
  • 🔍 Includes custom logging messages for better monitoring and debugging via Google Cloud Logging

Get started

Install the gcloud CLI and authenticate with your Google Cloud account as:

gcloud init
gcloud auth login

Then install vertex-ai-huggingface-inference-toolkit via pip install:

pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Or via uv pip install for faster installations using uv:

uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Example

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.2.0",
    transformers_version="4.38.2",
    python_version="3.10",
    cuda_version="12.3.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
model.deploy(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)

Once deployed we can send request to it via cURL:

curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict
Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)
from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.1.0",
    python_version="3.9",
    cuda_version="11.8.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
Example on running on existing Docker image

To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit in advance.

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
Example on running TinyLlama for `text-generation`
from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    project_id="my-project",
    location="us-east1",
    model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
    extra_requirements=["flash-attn --no-build-isolation"],
    environment_variables={
        "HF_TASK": "text-generation",
    },
)

References / Acknowledgements

This work is heavily inspired by sagemaker-huggingface-inference-toolkit early work from Philipp Schmid, Hugging Face, and Amazon Web Services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page