DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

29 Sep 2021 · Manuel Del Verme, Pierre-Luc Bacon ·

We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation. Our method leverages ideas from saddle-point optimization to derive stable first-order updates to solve a specific constrained optimization problem. Most importantly, we propose a novel solution allowing us to run our algorithm over mini-batches with stochastic gradient fashion and to decouple the number of auxiliary variables with the size of the dataset. We show empirically that our method reliably achieves higher accuracy than other comparable local (biologically plausible) learning methods on MNIST, CIFAR10 and ImageNet.

PDF Abstract