pytorch_optimizer

optimizer & lr scheduler implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Build
Quality
Package
Status
License

pytorch-optimizer is bunch of optimizer collections in PyTorch. Also, including useful optimization ideas.
Most of the implementations are based on the original paper, but I added some tweaks.
Highly inspired by pytorch-optimizer.

Getting Started

For more, see the documentation.

Installation

$ pip3 install -U pytorch-optimizer

If there’s a version issue when installing the package, try with –no-deps option.

$ pip3 install -U --no-deps pytorch-optimizer

Simple Usage

from pytorch_optimizer import AdamP

model = YourModel()
optimizer = AdamP(model.parameters())

# or you can use optimizer loader, simply passing a name of the optimizer.

from pytorch_optimizer import load_optimizer

model = YourModel()
opt = load_optimizer(optimizer='adamp')
optimizer = opt(model.parameters())

Also, you can load the optimizer via torch.hub

import torch

model = YourModel()
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
optimizer = opt(model.parameters())

If you want to build the optimizer with parameters & configs, there’s create_optimizer() API.

from pytorch_optimizer import create_optimizer

optimizer = create_optimizer(
    model,
    'adamp',
    lr=1e-3,
    weight_decay=1e-3,
    use_gc=True,
    use_lookahead=True,
)

Supported Optimizers

You can check the supported optimizers & lr schedulers.

from pytorch_optimizer import get_supported_optimizers, get_supported_lr_schedulers

supported_optimizers = get_supported_optimizers()
supported_lr_schedulers = get_supported_lr_schedulers()

Optimizer	Description	Official Code	Paper
AdaBelief	Adapting Step-sizes by the Belief in Observed Gradients	github	https://arxiv.org/abs/2010.07468
AdaBound	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	github	https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian	An Adaptive Second Order Optimizer for Machine Learning	github	https://arxiv.org/abs/2006.00719
AdamD	Improved bias-correction in Adam		https://arxiv.org/abs/2110.10828
AdamP	Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	github	https://arxiv.org/abs/2006.08217
diffGrad	An Optimization Method for Convolutional Neural Networks	github	https://arxiv.org/abs/1909.11015v3
MADGRAD	A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic	github	https://arxiv.org/abs/2101.11075
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	github	https://arxiv.org/abs/1908.03265
Ranger	a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer	github	https://bit.ly/3zyspC3
Ranger21	a synergistic deep learning optimizer	github	https://arxiv.org/abs/2106.13731
Lamb	Large Batch Optimization for Deep Learning	github	https://arxiv.org/abs/1904.00962
Shampoo	Preconditioned Stochastic Tensor Optimization	github	https://arxiv.org/abs/1802.09568
Nero	Learning by Turning: Neural Architecture Aware Optimisation	github	https://arxiv.org/abs/2102.07227
Adan	Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models	github	https://arxiv.org/abs/2208.06677
Adai	Disentangling the Effects of Adaptive Learning Rate and Momentum	github	https://arxiv.org/abs/2006.15815
GSAM	Surrogate Gap Guided Sharpness-Aware Minimization	github	https://openreview.net/pdf?id=edONMAnhLu-
D-Adaptation	Learning-Rate-Free Learning by D-Adaptation	github	https://arxiv.org/abs/2301.07733
AdaFactor	Adaptive Learning Rates with Sublinear Memory Cost	github	https://arxiv.org/abs/1804.04235
Apollo	An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization	github	https://arxiv.org/abs/2009.13586
NovoGrad	Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks	github	https://arxiv.org/abs/1905.11286
Lion	Symbolic Discovery of Optimization Algorithms	github	https://arxiv.org/abs/2302.06675
Ali-G	Adaptive Learning Rates for Interpolation with Gradients	github	https://arxiv.org/abs/1906.05661
SM3	Memory-Efficient Adaptive Optimization	github	https://arxiv.org/abs/1901.11150

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping	Gradient Centralization	Softplus Transformation
Gradient Normalization	Norm Loss	Positive-Negative Momentum
Linear learning rate warmup	Stable weight decay	Explore-exploit learning rate schedule
Lookahead	Chebyshev learning rate schedule	(Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond	Gradient Surgery for Multi-Task Learning

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.

AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

code : github
paper : arXiv

Gradient Centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

code : github
paper : arXiv

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization

Norm Loss

paper : arXiv

Positive-Negative Momentum

code : github
paper : arXiv

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

paper : arXiv

Stable weight decay

code : github
paper : arXiv

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

code : github
paper : arXiv

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is

updated and substituted to the current weights every k_{lookahead} steps (5 by default).

code : github
paper : arXiv

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

paper : arXiv

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.

In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

SAM paper : paper
ASAM paper : paper
A/SAM code : github

On the Convergence of Adam and Beyond

paper : paper

Gradient Surgery for Multi-Task Learning

paper : paper

Citation

Please cite original authors of optimization algorithms. If you use this software, please cite it as below. Or you can get from “cite this repository” button.

@software{Kim_pytorch_optimizer_Bunch_of_2022,
    author = {Kim, Hyeongchan},
    month = {1},
    title = {{pytorch_optimizer: optimizer & lr scheduler implementations in PyTorch}},
    version = {1.0.0},
    year = {2022}
}

Author

Hyeongchan Kim / @kozistr

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.12.0

Oct 7, 2023

2.11.2

Sep 2, 2023

2.11.1

Jul 19, 2023

2.11.0

Jun 27, 2023

2.10.1

Jun 13, 2023

2.10.0

Jun 7, 2023

2.9.1

May 19, 2023

2.9.0

May 6, 2023

2.8.0

Apr 29, 2023

2.7.0

Apr 26, 2023

2.6.1

Apr 22, 2023

This version

2.6.0

Apr 22, 2023

2.5.2

Apr 11, 2023

2.5.1

Mar 12, 2023

2.5.0

Feb 15, 2023

2.4.2

Feb 10, 2023

2.4.1

Feb 6, 2023

2.4.0

Feb 2, 2023

2.3.1

Jan 31, 2023

2.3.0

Jan 30, 2023

2.2.1

Jan 28, 2023

2.2.0

Jan 24, 2023

2.1.1

Jan 2, 2023

2.1.0

Jan 1, 2023

2.0.1

Nov 1, 2022

2.0.0

Oct 21, 2022

1.3.2

Sep 2, 2022

1.3.1

Sep 1, 2022

1.2.0

Aug 26, 2022

1.1.4

Aug 25, 2022

1.1.3

Aug 23, 2022

1.1.2

Jun 1, 2022

1.1.1

May 9, 2022

1.1.0

May 8, 2022

1.0.0

May 7, 2022

0.6.1

May 7, 2022

0.6.0

Apr 2, 2022

0.5.0

Mar 5, 2022

0.4.2

Mar 5, 2022

0.4.1

Feb 20, 2022

0.4.0

Feb 19, 2022

0.3.7

Feb 1, 2022

0.3.6

Jan 31, 2022

0.3.5

Jan 30, 2022

0.3.4

Jan 29, 2022

0.3.3

Jan 29, 2022

0.3.2

Jan 28, 2022

0.3.1

Jan 28, 2022

0.3.0

Jan 28, 2022

0.2.2

Nov 29, 2021

0.2.1

Nov 22, 2021

0.2.0

Nov 15, 2021

0.1.1

Oct 9, 2021

0.1.0

Oct 6, 2021

0.0.11

Oct 6, 2021

0.0.10

Sep 25, 2021

0.0.9

Sep 23, 2021

0.0.8

Sep 23, 2021

0.0.7

Sep 22, 2021

0.0.6

Sep 22, 2021

0.0.5

Sep 22, 2021

0.0.4

Sep 22, 2021

0.0.3

Sep 22, 2021

0.0.2

Sep 21, 2021

0.0.1

Sep 21, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_optimizer-2.6.0.tar.gz (58.8 kB view hashes)

Uploaded Apr 22, 2023 Source

Built Distribution

pytorch_optimizer-2.6.0-py3-none-any.whl (90.7 kB view hashes)

Uploaded Apr 22, 2023 Python 3

Hashes for pytorch_optimizer-2.6.0.tar.gz

Hashes for pytorch_optimizer-2.6.0.tar.gz
Algorithm	Hash digest
SHA256	`e86b2dcf811548a6f0943548a2cd6b5defa93b1f02e96ba473f6ce3914c192a4`
MD5	`a2ef6de1cde16ee349cecca3b185d06f`
BLAKE2b-256	`1b94c640cd469cb5851a6106a7ad7f4f66d60f90d4e1a69fb5a89ec0408c7b16`

Hashes for pytorch_optimizer-2.6.0-py3-none-any.whl

Hashes for pytorch_optimizer-2.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d617627cf90ff011cc2c775c36b6e5d2c6d50a756638bb87cf769b1e35b5b4f`
MD5	`90033bbf78730e2eb611f20acea99c77`
BLAKE2b-256	`ea4b2f8774abd6b13de61f3d0443a9383953b2def4aa6c20d5d641f883953379`

pytorch_optimizer 2.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Getting Started

Installation

Simple Usage

Supported Optimizers

Useful Resources

Adaptive Gradient Clipping

Gradient Centralization

Softplus Transformation

Gradient Normalization

Norm Loss

Positive-Negative Momentum

Linear learning rate warmup

Stable weight decay

Explore-exploit learning rate schedule

Lookahead

Chebyshev learning rate schedule

(Adaptive) Sharpness-Aware Minimization

On the Convergence of Adam and Beyond

Gradient Surgery for Multi-Task Learning

Citations

Citation

Author

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution