1 code implementation • 7 Sep 2021 • Anirudh Maiya, Inumella Sricharan, Anshuman Pandey, Srinivas K. S
To eliminate the dependency of learning rate schedulers, adaptive gradient optimizers such as AdaGrad, AdaDelta, RMSProp, Adam employ a parameter-wise scaling term for learning rate which is a function of the gradient itself.