Search Results for author: Lukas Balles

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run.

BIG-bench Machine Learning Hyperparameter Optimization +1

366

Paper
Code

Gradient-Matching Coresets for Rehearsal-Based Continual Learning

no code implementations • 28 Mar 2022 • Lukas Balles, Giovanni Zappella, Cédric Archambeau

Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data.

Continual Learning Management

Paper
Add Code

Gradient-matching coresets for continual learning

no code implementations • 9 Dec 2021 • Lukas Balles, Giovanni Zappella, Cédric Archambeau

We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset.

Continual Learning

Paper
Add Code

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

no code implementations • NeurIPS Workshop ICBINB 2020 • Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig

Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters.

Stochastic Optimization

Paper
Add Code

The Geometry of Sign Gradient Descent

no code implementations • ICLR 2020 • Lukas Balles, Fabian Pedregosa, Nicolas Le Roux

Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training.

Distributed Optimization

Paper
Add Code

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

1 code implementation • NeurIPS 2019 • Frederik Kunstner, Lukas Balles, Philipp Hennig

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information.

Second-order methods

Paper
Code

Holographic and other Point Set Distances for Machine Learning

no code implementations • ICLR 2019 • Lukas Balles, Thomas Fischbacher

We introduce an analytic distance function for moderately sized point sets of known cardinality that is shown to have very desirable properties, both as a loss function as well as a regularizer for machine learning applications.

BIG-bench Machine Learning object-detection +1

Paper
Add Code

DeepOBS: A Deep Learning Optimizer Benchmark Suite

1 code implementation • ICLR 2019 • Frank Schneider, Lukas Balles, Philipp Hennig

We suggest routines and benchmarks for stochastic optimization, with special focus on the unique aspects of deep learning, such as stochasticity, tunability and generalization.

Benchmarking Image Classification +1

Paper
Code

Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

1 code implementation • CVPR 2019 • Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black

We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions.

Ranked #66 on Monocular Depth Estimation on KITTI Eigen split

Depth Prediction Monocular Depth Estimation +3

488

Paper
Code

Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

2 code implementations • ICML 2018 • Lukas Balles, Philipp Hennig

The ADAM optimizer is exceedingly popular in the deep learning community.

Paper
Code

Early Stopping without a Validation Set

no code implementations • 28 Mar 2017 • Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig

Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization.

regression

Paper
Add Code

Coupling Adaptive Batch Sizes with Learning Rates

1 code implementation • 15 Dec 2016 • Lukas Balles, Javier Romero, Philipp Hennig

The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates.

Image Classification Stochastic Optimization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.