Search Results for author: Blake Bordelon

Found 17 papers, 9 papers with code

A Dynamical Model of Neural Scaling Laws

no code implementations • 2 Feb 2024 • Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude.

Paper
Add Code

Grokking as the Transition from Lazy to Rich Training Dynamics

no code implementations • 9 Oct 2023 • Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low.

Paper
Add Code

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

no code implementations • 28 Sep 2023 • Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.

Paper
Add Code

Loss Dynamics of Temporal Difference Reinforcement Learning

1 code implementation • NeurIPS 2023 • Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan

We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.

reinforcement-learning

Paper
Code

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

no code implementations • NeurIPS 2023 • Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan

We call this the bias of narrower width.

Language Modelling

Paper
Add Code

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

1 code implementation • NeurIPS 2023 • Blake Bordelon, Cengiz Pehlevan

However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.

Paper
Code

The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

1 code implementation • 23 Dec 2022 • Alexander Atanasov, Blake Bordelon, Sabarish Sainathan, Cengiz Pehlevan

For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime.

regression

Paper
Code

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

no code implementations • 5 Oct 2022 • Blake Bordelon, Cengiz Pehlevan

In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices.

Paper
Add Code

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

no code implementations • 19 May 2022 • Blake Bordelon, Cengiz Pehlevan

We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory.

Paper
Add Code

Neural Networks as Kernel Learners: The Silent Alignment Effect

no code implementations • ICLR 2022 • Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan

Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel?

Paper
Add Code

Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?

1 code implementation • ICLR 2022 • Matthew Farrell, Blake Bordelon, Shubhendu Trivedi, Cengiz Pehlevan

We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action.

Paper
Code

Learning Curves for SGD on Structured Features

1 code implementation • ICLR 2022 • Blake Bordelon, Cengiz Pehlevan

To analyze the influence of data structure on test loss dynamics, we study an exactly solveable model of stochastic gradient descent (SGD) on mean square loss which predicts test loss when training on features with arbitrary covariance structure.

BIG-bench Machine Learning Feature Correlation +1

Paper
Code

Out-of-Distribution Generalization in Kernel Regression

1 code implementation • NeurIPS 2021 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics.

BIG-bench Machine Learning Out-of-Distribution Generalization +1

Paper
Code

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

no code implementations • 29 May 2021 • Haozhe Shan, Blake Bordelon

In this work, we seek to theoretically understand kernel alignment, a prominent and ubiquitous structural change that aligns the NTK with the target function.

Paper
Add Code

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

1 code implementation • 23 Jun 2020 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit.

BIG-bench Machine Learning Inductive Bias +1

Paper
Code

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

1 code implementation • ICML 2020 • Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.

Gaussian Processes regression

Paper
Code

Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks

1 code implementation • 7 Oct 2018 • Bryce Bagley, Blake Bordelon, Benjamin Moseley, Ralf Wessel

Learning synaptic weights of spiking neural network (SNN) models that can reproduce target spike trains from provided neural firing data is a central problem in computational neuroscience and spike-based computing.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.