no code implementations • 14 Jan 2021 • Congliang Chen, Li Shen, Fangyu Zou, Wei Liu
Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples.
no code implementations • CVPR 2019 • Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu
Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples.
no code implementations • 10 Aug 2018 • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu
Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}.