Skip to main content

Automatically shard your large model between multiple GPUs, works without torch.distributed

Project description

tensor_parallel

Run your PyTorch model on multiple GPUs from basic python

import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

from tensor_parallel import tensor_parallel # <- interface for automatic optimal backend selection

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

model = tensor_parallel(model, ["cuda:0", "cuda:1"]) # <- magic happens here
# only half of the model is placed on each GPU reducing memory footprint twofold

inputs = tokenizer("Translate from German to English: How are you?", return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(inputs, num_beams=5)
print(tokenizer.decode(outputs[0]))  # Wie sind Sie?

Installation

The recomended way to install this package is to use pip:

pip install tensor_parallel

Code style

We use black and isort for all pull requests. Before committing your code, simply run black . && isort . and you will be fine.


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensor_parallel-1.0.19.tar.gz (16.7 kB view hashes)

Uploaded Source

Built Distribution

tensor_parallel-1.0.19-py3-none-any.whl (19.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page