Stochastic Optimization

Stochastic Optimization methods are used to optimize neural networks. We typically take a mini-batch of data, hence 'stochastic', and perform a type of gradient descent with this minibatch. Below you can find a continuously updating list of stochastic optimization algorithms.

METHOD YEAR PAPERS
Adam
2014 3166
SGD
1951 825
RMSProp
2013 174
SGD with Momentum
1999 118
AdaGrad
2011 64
LAMB
2019 37
Adafactor
2018 22
AMSGrad
2019 20
Nesterov Accelerated Gradient
1983 17
LARS
2017 11
AdamW
2017 11
Stochastic Weight Averaging
2018 8
AdaBound
2019 6
NT-ASGD
2017 6
AdaDelta
2012 6
Lookahead
2019 5
RAdam
2019 5
NADAM
2015 3
AdaShift
2018 2
AdaMax
2014 2
Polyak Averaging
1991 2
QHM
2018 2
Apollo
2020 2
Demon CM
2019 1
AdaSqrt
2019 1
AdaMod
2019 1
Demon ADAM
2019 1
Demon
2019 1
AggMo
2018 1
YellowFin
2017 1
SGDW
2017 1
Distributed Shampoo
2021 1
AMSBound
2019 1
QHAdam
2018 1
MPSO
2020 1