Deep Learning Training Acceleration with Bagua and Lightning AI
Project description
Lightning + Bagua
Deep Learning Training Acceleration with Bagua and Lightning AI
Bagua is a deep learning training acceleration framework which supports multiple advanced distributed training algorithms including:
- Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
- Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
- ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
- Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.
Installation
pip install -U lightning lightning-bagua
Usage
Simply set the strategy argument in the Trainer:
from lightning import Trainer
# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)
By specifying the algorithm
in the BaguaStrategy
, you can select more advanced training algorithms featured by Bagua:
from lightning import Trainer
from lightning_bagua import BaguaStrategy
# train on 4 GPUs, using Bagua Gradient AllReduce algorithm
trainer = Trainer(
strategy=BaguaStrategy(algorithm="gradient_allreduce"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua ByteGrad algorithm
trainer = Trainer(
strategy=BaguaStrategy(algorithm="bytegrad"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua Decentralized SGD
trainer = Trainer(
strategy=BaguaStrategy(algorithm="decentralized"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua Low Precision Decentralized SGD
trainer = Trainer(
strategy=BaguaStrategy(algorithm="low_precision_decentralized"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Asynchronous Model Average algorithm, with a synchronization interval of 100ms
trainer = Trainer(
strategy=BaguaStrategy(algorithm="async", sync_interval_ms=100),
accelerator="gpu",
devices=4,
)
To use QAdam, we need to initialize QAdamOptimizer first:
from lightning import Trainer
import lightning.pytorch as pl
from lightning_bagua import BaguaStrategy
from bagua.torch_api.algorithms.q_adam import QAdamOptimizer
class MyModel(pl.LightningModule):
...
def configure_optimizers(self):
# initialize QAdam Optimizer
return QAdamOptimizer(self.parameters(), lr=0.05, warmup_steps=100)
model = MyModel()
trainer = Trainer(
accelerator="gpu",
devices=4,
strategy=BaguaStrategy(algorithm="qadam"),
)
trainer.fit(model)
Bagua relies on its own launcher to schedule jobs. Below, find examples using bagua.distributed.launch
which follows torch.distributed.launch
API:
# start training with 8 GPUs on a single node
python -m bagua.distributed.launch --nproc_per_node=8 train.py
If the ssh service is available with passwordless login on each node, you can launch the distributed job on a single node with baguarun
which has a similar syntax as mpirun
. When staring the job, baguarun
will automatically spawn new processes on each of your training node provided by --host_list
option and each node in it is described as an ip address followed by a ssh port.
# Run on node1 (or node2) to start training on two nodes (node1 and node2), 8 GPUs per node
baguarun --host_list hostname1:ssh_port1,hostname2:ssh_port2 --nproc_per_node=8 --master_port=port1 train.py
Note
You can also start training in the same way as Distributed Data Parallel. However, system optimizations like Bagua-Net and Performance autotuning can only be enabled through bagua launcher. It is worth noting that with Bagua-Net
, Distributed Data Parallel can also achieve better performance without modifying the training script.
See Bagua Tutorials for more details on installation and advanced features.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lightning_bagua-0.1.0rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47060bf16a90cd650b74daee9dc2f3c644c1ef8cb4f35a9127c903a56139715e |
|
MD5 | 93f9b7a99786d150b4209d1cc55e9128 |
|
BLAKE2b-256 | ff7c52ab4e5620f9042bcdd78650e2bb529bf6db347d5a3ff33e097e0c40295a |