Stochastic Optimization

SGDW

Introduced by Loshchilov et al. in Decoupled Weight Decay Regularization

SGDW is a stochastic optimization technique that decouples weight decay from the gradient update:

$$ g_{t} = \nabla{f_{t}}\left(\theta_{t-1}\right) + \lambda\theta_{t-1}$$

$$ m_{t} = \beta_{1}m_{t-1} + \eta_{t}\alpha{g}_{t}$$

$$ \theta_{t} = \theta_{t-1} - m_{t} - \eta_{t}\lambda\theta_{t-1}$$

Source: Decoupled Weight Decay Regularization

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Classification 1 100.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories