no code implementations • 19 Jun 2018 • Matteo Fischetti, Iacopo Mandatelli, Domenico Salvagnin
It is well known that, for most datasets, the use of large-size minibatches for Stochastic Gradient Descent (SGD) typically leads to slow convergence and poor generalization.