no code implementations • 8 Dec 2023 • Lukas Balles, Cedric Archambeau, Giovanni Zappella
With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden.
no code implementations • 29 Nov 2023 • Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella
Recent work using pretrained transformers has shown impressive performance when fine-tuned with data from the downstream problem of interest.
1 code implementation • 24 Apr 2023 • Martin Wistuba, Martin Ferianc, Lukas Balles, Cedric Archambeau, Giovanni Zappella
We discuss requirements for the use of continual learning algorithms in practice, from which we derive design principles for Renate.
2 code implementations • 14 Jul 2022 • Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run.
no code implementations • 28 Mar 2022 • Lukas Balles, Giovanni Zappella, Cédric Archambeau
Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data.
no code implementations • 9 Dec 2021 • Lukas Balles, Giovanni Zappella, Cédric Archambeau
We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset.
no code implementations • NeurIPS Workshop ICBINB 2020 • Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters.
no code implementations • ICLR 2020 • Lukas Balles, Fabian Pedregosa, Nicolas Le Roux
Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training.
1 code implementation • NeurIPS 2019 • Frederik Kunstner, Lukas Balles, Philipp Hennig
Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information.
no code implementations • ICLR 2019 • Lukas Balles, Thomas Fischbacher
We introduce an analytic distance function for moderately sized point sets of known cardinality that is shown to have very desirable properties, both as a loss function as well as a regularizer for machine learning applications.
1 code implementation • ICLR 2019 • Frank Schneider, Lukas Balles, Philipp Hennig
We suggest routines and benchmarks for stochastic optimization, with special focus on the unique aspects of deep learning, such as stochasticity, tunability and generalization.
1 code implementation • CVPR 2019 • Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions.
Ranked #66 on Monocular Depth Estimation on KITTI Eigen split
2 code implementations • ICML 2018 • Lukas Balles, Philipp Hennig
The ADAM optimizer is exceedingly popular in the deep learning community.
no code implementations • 28 Mar 2017 • Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization.
1 code implementation • 15 Dec 2016 • Lukas Balles, Javier Romero, Philipp Hennig
The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates.