no code implementations • 22 Feb 2024 • Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann
Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time.
1 code implementation • 12 Feb 2024 • Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh
Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.
no code implementations • 10 Nov 2023 • Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann
In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality.
no code implementations • 6 Nov 2023 • Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann
This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.
1 code implementation • 9 Oct 2023 • Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh
Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities.
1 code implementation • NeurIPS 2023 • Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann
We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.
no code implementations • 14 Apr 2023 • Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, Alexander Mattick
In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161, 443 messages in 35 different languages, annotated with 461, 292 quality ratings, resulting in over 10, 000 complete and fully annotated conversation trees.
1 code implementation • 23 Feb 2023 • Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann
In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.
no code implementations • 22 Nov 2022 • Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann
In this work, we aim to improve upon two-point statistics by employing a \textit{PointNet}-like neural network to regress the values of the cosmological parameters directly from point cloud data.
no code implementations • 25 Oct 2022 • Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann
While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.
no code implementations • ICCV 2023 • Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann
Accurately predicting road networks from satellite images requires a global understanding of the network topology.
no code implementations • 7 Jun 2022 • Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi
First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.
no code implementations • 22 Feb 2021 • Sotiris Anagnostidis, Aurelien Lucchi, Youssef Diouane
Recent applications in machine learning have renewed the interest of the community in min-max optimization problems.