no code implementations • 29 Sep 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • 20 May 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • ICLR 2022 • Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning.
no code implementations • 7 Dec 2020 • Kangqiao Liu, Liu Ziyin, Masahito Ueda
In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood.