Search Results for author: Gregor Bachmann

Found 17 papers, 5 papers with code

The pitfalls of next-token prediction

1 code implementation • 11 Mar 2024 • Gregor Bachmann, Vaishnavh Nagarajan

As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly.

Paper
Code

A Language Model's Guide Through Latent Space

no code implementations • 22 Feb 2024 • Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time.

Novel Concepts

Paper
Add Code

How Good is a Single Basin?

no code implementations • 5 Feb 2024 • Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann

The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles.

Paper
Add Code

Disentangling Linear Mode-Connectivity

no code implementations • 15 Dec 2023 • Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes.

Linear Mode Connectivity

Paper
Add Code

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

no code implementations • 6 Nov 2023 • Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.

Paper
Add Code

Scaling MLPs: A Tale of Inductive Bias

1 code implementation • NeurIPS 2023 • Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.

Computational Efficiency Inductive Bias +1

Paper
Code

Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

no code implementations • 4 Jun 2023 • Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore.

Common Sense Reasoning Question Answering +2

Paper
Add Code

CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

1 code implementation • 12 Apr 2023 • Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore.

Question Answering Visual Question Answering

Paper
Code

Random Teachers are Good Teachers

1 code implementation • 23 Feb 2023 • Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.

Data Augmentation Self-Supervised Learning

Paper
Code

The Curious Case of Benign Memorization

no code implementations • 25 Oct 2022 • Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.

Data Augmentation Memorization

Paper
Add Code

How Tempering Fixes Data Augmentation in Bayesian Neural Networks

no code implementations • 27 May 2022 • Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing.

Data Augmentation

Paper
Add Code

Generalization Through The Lens Of Leave-One-Out Error

1 code implementation • ICLR 2022 • Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi

Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited.

Generalization Bounds Transfer Learning

Paper
Code

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations • NeurIPS 2021 • Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

Paper
Add Code

Precise characterization of the prior predictive distribution of deep ReLU networks

no code implementations • NeurIPS 2021 • Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas Hofmann

Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture.

Paper
Add Code

Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

no code implementations • NeurIPS 2021 • Lorenzo Noci, Kevin Roth, Gregor Bachmann, Sebastian Nowozin, Thomas Hofmann

The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.

Data Augmentation

Paper
Add Code

Uniform Convergence, Adversarial Spheres and a Simple Remedy

no code implementations • 7 May 2021 • Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Thomas Hofmann

By considering a specific dataset, it was observed that a neural network completely misclassifies a projection of the training data (adversarial set), rendering any existing generalization bound based on uniform convergence vacuous.

Paper
Add Code

Constant Curvature Graph Convolutional Networks

no code implementations • ICML 2020 • Gregor Bachmann, Gary Bécigneul, Octavian-Eugen Ganea

Interest has been rising lately towards methods representing data in non-Euclidean spaces, e. g. hyperbolic or spherical, that provide specific inductive biases useful for certain real-world data properties, e. g. scale-free, hierarchical or cyclical.

Node Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.