Stochastic Optimization

Stochastic Optimization methods are used to optimize neural networks. We typically take a mini-batch of data, hence 'stochastic', and perform a type of gradient descent with this minibatch. Below you can find a continuously updating list of stochastic optimization algorithms.

METHOD YEAR PAPERS
Adam
2014 2692
SGD
1951 757
RMSProp
2013 124
SGD with Momentum
1999 115
AdaGrad
2011 59
LAMB
2019 24
AMSGrad
2019 17
Nesterov Accelerated Gradient
1983 16
Adafactor
2018 10
LARS
2017 9
AdamW
2017 8
AdaBound
2019 7
NT-ASGD
2017 7
AdaDelta
2012 6
Stochastic Weight Averaging
2018 6
RAdam
2019 4
NADAM
2015 4
Lookahead
2019 3
AdaMax
2014 2
Polyak Averaging
1991 2
QHM
2018 2
AdaShift
2018 2
Demon
2019 1
AggMo
2018 1
YellowFin
2017 1
SGDW
2017 1
AMSBound
2019 1
QHAdam
2018 1
Demon CM
2019 1
AdaSqrt
2019 1
AdaMod
2019 1
Demon ADAM
2019 1