1 code implementation • 11 Mar 2024 • Gregor Bachmann, Vaishnavh Nagarajan
As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly.
no code implementations • 22 Feb 2024 • Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann
Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time.
no code implementations • 5 Feb 2024 • Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann
The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles.
no code implementations • 15 Dec 2023 • Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann
Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes.
no code implementations • 6 Nov 2023 • Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann
This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.
1 code implementation • NeurIPS 2023 • Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann
We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.
no code implementations • 4 Jun 2023 • Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann
Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore.
1 code implementation • 12 Apr 2023 • Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann
Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore.
1 code implementation • 23 Feb 2023 • Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann
In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.
no code implementations • 25 Oct 2022 • Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann
While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.
no code implementations • 27 May 2022 • Gregor Bachmann, Lorenzo Noci, Thomas Hofmann
While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing.
1 code implementation • ICLR 2022 • Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi
Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited.
no code implementations • NeurIPS 2021 • Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann
Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.
no code implementations • NeurIPS 2021 • Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas Hofmann
Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture.
no code implementations • NeurIPS 2021 • Lorenzo Noci, Kevin Roth, Gregor Bachmann, Sebastian Nowozin, Thomas Hofmann
The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.
no code implementations • 7 May 2021 • Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Thomas Hofmann
By considering a specific dataset, it was observed that a neural network completely misclassifies a projection of the training data (adversarial set), rendering any existing generalization bound based on uniform convergence vacuous.
no code implementations • ICML 2020 • Gregor Bachmann, Gary Bécigneul, Octavian-Eugen Ganea
Interest has been rising lately towards methods representing data in non-Euclidean spaces, e. g. hyperbolic or spherical, that provide specific inductive biases useful for certain real-world data properties, e. g. scale-free, hierarchical or cyclical.