no code implementations • 28 Jul 2020 • Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li
Empirical results on deep learning verify that when adopting the same large batch size, SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.
no code implementations • 26 Feb 2020 • Shen-Yi Zhao, Yin-Peng Xie, Wu-Jun Li
We theoretically prove that, compared to classical stagewise SGD which decreases learning rate by stage, \mbox{SEBS} can reduce the number of parameter updates without increasing generalization error.
no code implementations • 30 May 2019 • Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li
With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models.