🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI

[!WARNING] This is still very at a very early stage and subject to major changes.

Features

🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using transformers.pipeline
📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former google-cloud-aiplatform SDK for custom models
🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to transformers
🌅 Support for diffusers models too!
🔍 Includes custom logging messages for better monitoring and debugging via Google Cloud Logging

Get started

Install the gcloud CLI and authenticate with your Google Cloud account as:

gcloud init
gcloud auth login

Then install vertex-ai-huggingface-inference-toolkit via pip install:

pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Or via uv pip install for faster installations using uv:

uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Example

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.2.0",
    transformers_version="4.38.2",
    python_version="3.10",
    cuda_version="12.3.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
model.deploy(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)

Once deployed we can send request to it via cURL:

curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict

Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.1.0",
    python_version="3.9",
    cuda_version="11.8.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)

Example on running on existing Docker image

To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit in advance.

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)

Example on running TinyLlama for `text-generation`

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    project_id="my-project",
    location="us-east1",
    model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
    extra_requirements=["flash-attn --no-build-isolation"],
    environment_variables={
        "HF_TASK": "text-generation",
    },
)

References / Acknowledgements

This work is heavily inspired by sagemaker-huggingface-inference-toolkit early work from Philipp Schmid, Hugging Face, and Amazon Web Services.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.2

Mar 20, 2024

0.0.1

Mar 12, 2024

0.0.1b12 pre-release

Mar 6, 2024

0.0.1b11 pre-release

Mar 5, 2024

0.0.1b10 pre-release

Mar 4, 2024

0.0.1b9 pre-release

Mar 4, 2024

0.0.1b8 pre-release

Mar 4, 2024

0.0.1b7 pre-release

Mar 4, 2024

0.0.1b6 pre-release

Mar 4, 2024

0.0.1b5 pre-release

Mar 4, 2024

0.0.1b4 pre-release

Mar 4, 2024

0.0.1b3 pre-release

Mar 4, 2024

0.0.1b2 pre-release

Mar 4, 2024

0.0.1b1 pre-release

Mar 4, 2024

0.0.1b0 pre-release

Mar 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz (27.4 kB view hashes)

Uploaded Mar 20, 2024 Source

Built Distribution

vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl (43.6 kB view hashes)

Uploaded Mar 20, 2024 Python 3

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`71690ea3c9a4284a1270cff6004464e14d7bc086db69598e8b2b50c8fd8c1da3`
MD5	`aec7a0cceef1662b7d058a71d6a34223`
BLAKE2b-256	`e69ab3dd7e3a327032a249877371ab693511a1233c69531c67aecab15a6dc485`

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fbe588a3ffa7684425ce36d380004b365787c5d2b098e777b74132b900ebd68`
MD5	`7b0018092f0f6802c850b2b96ceb79db`
BLAKE2b-256	`0334b53c849df2c82357629976be65ce80a90ac230a9f985f56929e94ccd11d4`