Stochastic Optimization

Demon ADAM

Introduced by Chen et al. in Demon: Improved Neural Network Training with Momentum Decay

Demon Adam is a stochastic optimizer where the Demon momentum rule is applied to the Adam optimizer.

$$ \beta_{t} = \beta_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta_{init}\right) + \beta_{init}\left(1-\frac{t}{T}\right)} $$

$$ m_{t, i} = g_{t, i} + \beta_{t}m_{t-1, i} $$

$$ v_{t+1} = \beta_{2}v_{t} + \left(1-\beta_{2}\right)g^{2}_{t} $$

$$ \theta_{t} = \theta_{t-1} - \eta\frac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon} $$

Source: Demon: Improved Neural Network Training with Momentum Decay

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Classification 1 100.00%

Categories