PyTorch CurveBall - A second-order optimizer for deep networks

21 May 2018 · João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi ·

We propose a fast second-order method that can be used as a drop-in replacementfor current deep learning solvers. Compared to stochastic gradient descent (SGD),it only requires two additional forward-mode automatic differentiation operationsper iteration, which has a computational cost comparable to two standard forwardpasses and is easy to implement. Our method addresses long-standing issueswith current second-order solvers, which invert an approximate Hessian matrixevery iteration exactly or by conjugate-gradient methods, a procedure that is bothcostly and sensitive to noise. Instead, we propose to keep a single estimate of thegradient projected by the inverse Hessian matrix, and update it once per iteration.This estimate has the same size and is similar to the momentum variable thatis commonly used in SGD. No estimate of the Hessian is maintained. We firstvalidate our method, calledCURVEBALL, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks),where current deep learning solvers seem to struggle. We then train several largemodels on CIFAR and ImageNet, including ResNet and VGG-f networks, where wedemonstrate faster convergence with no hyperparameter tuning. Code is available.

PDF Abstract