no code implementations • 25 Sep 2019 • Kaiwen Zhou, Yanghua Jin, Qinghua Ding, James Cheng
Stochastic Gradient Descent (SGD) with Nesterov's momentum is a widely used optimizer in deep learning, which is observed to have excellent generalization performance.