Search Results for author: Bobby He

Found 10 papers, 3 papers with code

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

no code implementations • 12 Mar 2024 • Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories.

Paper
Add Code

Recurrent Distance Filtering for Graph Representation Learning

no code implementations • 3 Dec 2023 • Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively.

Graph Representation Learning Inductive Bias +1

Paper
Add Code

Simplifying Transformer Blocks

1 code implementation • 3 Nov 2023 • Bobby He, Thomas Hofmann

A simple design recipe for deep Transformers is to compose identical building blocks.

270

Paper
Code

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

no code implementations • NeurIPS 2023 • Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width.

Deep Attention Learning Theory

Paper
Add Code

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

no code implementations • 20 Feb 2023 • Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood.

Paper
Add Code

UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography

1 code implementation • 22 Feb 2022 • Francisca Vasconcelos, Bobby He, Nalini Singh, Yee Whye Teh

To that end, we study a Bayesian reformulation of INRs, UncertaINR, in the context of computed tomography, and evaluate several Bayesian deep learning implementations in terms of accuracy and calibration.

Computed Tomography (CT) Decision Making +2

Paper
Code

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

no code implementations • 22 Oct 2021 • Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.

L2 Regularization regression

Paper
Add Code

Feature Kernel Distillation

no code implementations • ICLR 2022 • Bobby He, Mete Ozay

Trained Neural Networks (NNs) can be viewed as data-dependent kernel machines, with predictions determined by the inner product of last-layer representations across inputs, referred to as the feature kernel.

Image Classification Knowledge Distillation

Paper
Add Code

Stable ResNet

no code implementations • 24 Oct 2020 • Soufiane Hayou, Eugenio Clerico, Bobby He, George Deligiannidis, Arnaud Doucet, Judith Rousseau

Deep ResNet architectures have achieved state of the art performance on many tasks.

Paper
Add Code

Bayesian Deep Ensembles via the Neural Tangent Kernel

3 code implementations • NeurIPS 2020 • Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs).

Gaussian Processes

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.