🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)
Project description
🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI
[!WARNING] This is still very at a very early stage and subject to major changes.
Features
- 🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
- 🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using
transformers.pipeline
- 📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former
google-cloud-aiplatform
SDK for custom models - 🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to
transformers
- 🌅 Support for
diffusers
models too! - 🔍 Includes custom
logging
messages for better monitoring and debugging via Google Cloud Logging
Get started
Install the gcloud
CLI and authenticate with your Google Cloud account as:
gcloud init
gcloud auth login
Then install vertex-ai-huggingface-inference-toolkit
via pip install
:
pip install vertex-ai-huggingface-inference-toolkit>=0.0.2
Or via uv pip install
for faster installations using uv
:
uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2
Example
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
framework="torch",
framework_version="2.2.0",
transformers_version="4.38.2",
python_version="3.10",
cuda_version="12.3.0",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
model.deploy(
machine_type="n1-standard-4",
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)
Once deployed we can send request to it via cURL
:
curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict
Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
framework="torch",
framework_version="2.1.0",
python_version="3.9",
cuda_version="11.8.0",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
Example on running on existing Docker image
To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit
in advance.
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
Example on running TinyLlama for `text-generation`
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
project_id="my-project",
location="us-east1",
model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
extra_requirements=["flash-attn --no-build-isolation"],
environment_variables={
"HF_TASK": "text-generation",
},
)
References / Acknowledgements
This work is heavily inspired by sagemaker-huggingface-inference-toolkit
early work from Philipp Schmid, Hugging Face, and Amazon Web Services.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71690ea3c9a4284a1270cff6004464e14d7bc086db69598e8b2b50c8fd8c1da3 |
|
MD5 | aec7a0cceef1662b7d058a71d6a34223 |
|
BLAKE2b-256 | e69ab3dd7e3a327032a249877371ab693511a1233c69531c67aecab15a6dc485 |
Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fbe588a3ffa7684425ce36d380004b365787c5d2b098e777b74132b900ebd68 |
|
MD5 | 7b0018092f0f6802c850b2b96ceb79db |
|
BLAKE2b-256 | 0334b53c849df2c82357629976be65ce80a90ac230a9f985f56929e94ccd11d4 |