AdaMax

Introduced by Kingma et al. in Adam: A Method for Stochastic Optimization

AdaMax is a generalisation of Adam from the $l_{2}$ norm to the $l_{\infty}$ norm. Define:

$$ u_{t} = \beta^{\infty}_{2}v_{t-1} + \left(1-\beta^{\infty}_{2}\right)|g_{t}|^{\infty}$$

$$ = \max\left(\beta_{2}\cdot{v}_{t-1}, |g_{t}|\right)$$

We can plug into the Adam update equation by replacing $\sqrt{\hat{v}_{t} + \epsilon}$ with $u_{t}$ to obtain the AdaMax update rule:

$$ \theta_{t+1} = \theta_{t} - \frac{\eta}{u_{t}}\hat{m}_{t} $$

Common default values are $\eta = 0.002$ and $\beta_{1}=0.9$ and $\beta_{2}=0.999$.

Source: Adam: A Method for Stochastic Optimization

Latest Papers

PAPER DATE
NVAE: A Deep Hierarchical Variational Autoencoder
Arash VahdatJan Kautz
2020-07-08
Adam: A Method for Stochastic Optimization
| Diederik P. KingmaJimmy Ba
2014-12-22

Tasks

TASK PAPERS SHARE
Image Generation 1 100.00%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories