Extrapolation for Large-batch Training in Deep Learning

ICML 2020 Tao LinLingjing KongSebastian U. StichMartin Jaggi

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data. A major roadblock faced when increasing the batch size to a substantial fraction of the training data for improving training time is the persistent degradation in performance (generalization gap)... (read more)

PDF Abstract ICML 2020 PDF


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper