Achieving Small-Batch Accuracy with Large-Batch Scalability via Adaptive Learning Rate Adjustment

29 Sep 2021 · Sunwoo Lee, Salman Avestimehr ·

We consider synchronous data-parallel neural network training with fixed large batch sizes. While the large batch size provides a high degree of parallelism, it likely degrades the generalization performance due to the low gradient noise scale. We propose a two-phase adaptive learning rate adjustment framework that tackles the poor generalization issue in large-batch training. Our empirical study shows that the number of training epochs before decaying the learning rate strongly affects the final accuracy. The framework performs extra epochs using the large learning rate even after the loss is flattened. After sufficient training under the noisy condition, the framework decays the learning rate based on the observed loss landscape at run-time. Our experimental results demonstrate that the proposed heuristics and algorithm enable to use an extremely large batch size while maintaining the model accuracy. For CIFAR-10 classification with ResNet20, our method achieves $92.66\%$ accuracy using $8,192$ batch size, which is close to $92.83\%$ achieved using $128$ batch size, at a negligible extra computational cost.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Achieving Small-Batch Accuracy with Large-Batch Scalability via Adaptive Learning Rate Adjustment

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove