1 code implementation • ICML 2020 • Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin
When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality.
no code implementations • 25 Sep 2019 • Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin
When using distributed training to speed up stochastic gradient descent, learning rates must adapt to new scales in order to maintain training effectiveness.