AdaMax is a generalisation of Adam from the $l_{2}$ norm to the $l_{\infty}$ norm. Define:
$$ u_{t} = \beta^{\infty}_{2}v_{t1} + \left(1\beta^{\infty}_{2}\right)g_{t}^{\infty}$$
$$ = \max\left(\beta_{2}\cdot{v}_{t1}, g_{t}\right)$$
We can plug into the Adam update equation by replacing $\sqrt{\hat{v}_{t} + \epsilon}$ with $u_{t}$ to obtain the AdaMax update rule:
$$ \theta_{t+1} = \theta_{t}  \frac{\eta}{u_{t}}\hat{m}_{t} $$
Common default values are $\eta = 0.002$ and $\beta_{1}=0.9$ and $\beta_{2}=0.999$.
Source:PAPER  DATE 

NVAE: A Deep Hierarchical Variational Autoencoder
• 
20200708 
Adam: A Method for Stochastic Optimization

20141222 
TASK  PAPERS  SHARE 

Image Generation  1  100.00% 
COMPONENT  TYPE 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 