Trainable Learning Rate

29 Sep 2021 · George Retsinas, Giorgos Sfikas, Panagiotis Filntisis, Petros Maragos ·

Selecting an appropriate learning rate for efficiently training deep neural networks is a difficult process that can be affected by numerous parameters, such as the dataset, the model architecture or even the batch size. In this work, we propose an algorithm for automatically adjusting the learning rate during gradient descent. The rationale behind our approach is to train the learning rate along with the model weight, akin to line-search. Contrary to existing approaches, learning rate is optimized via a simple extra gradient descent step, justified by an analysis that takes into consideration the structure of a neural network loss function. We formulate first and second-order gradients with respect to learning rate as functions of consecutive weight gradients, leading to a cost-effective implementation. We also show that the scheme can be extended to accommodate for different learning rates per layer. Extensive experimental evaluation is conducted, validating the effectiveness of the proposed method for a plethora of different settings. The proposed method has proven to be robust to both the initial learning rate and the batch size, making it ideal for an off-the-shelf optimization scheme.

PDF Abstract