AdaMod

Introduced by Ding et al. in An Adaptive and Momental Bound Method for Stochastic Learning

AdaMod is a stochastic optimizer that restricts adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

The weight updates are performed as:

$$ g_{t} = \nabla{f}_{t}\left(\theta_{t-1}\right) $$

$$ m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)g_{t} $$

$$ v_{t} = \beta_{2}v_{t-1} + \left(1-\beta_{2}\right)g_{t}^{2} $$

$$ \hat{m}_{t} = m_{t} / \left(1 - \beta^{t}_{1}\right)$$

$$ \hat{v}_{t} = v_{t} / \left(1 - \beta^{t}_{2}\right)$$

$$ \eta_{t} = \alpha_{t} / \left(\sqrt{\hat{v}_{t}} + \epsilon\right) $$

$$ s_{t} = \beta_{3}s_{t-1} + (1-\beta_{3})\eta_{t} $$

$$ \hat{\eta}_{t} = \text{min}\left(\eta_{t}, s_{t}\right) $$

$$ \theta_{t} = \theta_{t-1} - \hat{\eta}_{t}\hat{m}_{t} $$

Source: An Adaptive and Momental Bound Method for Stochastic Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization

AdaMod

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove