no code implementations • 29 Sep 2021 • Zhize Li, Slavomir Hanzely, Peter Richtárik
Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.
no code implementations • 3 May 2020 • Motasem Alfarra, Slavomir Hanzely, Alyazeed Albasyoni, Bernard Ghanem, Peter Richtarik
Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i. e., the number of iterations times the batch size.