Search Results for author: Atli Kosson

Found 7 papers, 4 papers with code

Memory Efficient Mixed-Precision Optimizers

no code implementations • 21 Sep 2023 • Basile Lewandowski, Atli Kosson

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power.

Paper
Add Code

Multiplication-Free Transformer Training via Piecewise Affine Operations

1 code implementation • NeurIPS 2023 • Atli Kosson, Martin Jaggi

Finally, we show that we can eliminate all multiplications in the entire training process, including operations in the forward pass, backward pass and optimizer update, demonstrating the first successful training of modern neural network architectures in a fully multiplication-free fashion.

Paper
Code

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

2 code implementations • 26 May 2023 • Atli Kosson, Bettina Messmer, Martin Jaggi

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation.

L2 Regularization

Paper
Code

Ghost Noise for Regularizing Deep Neural Networks

1 code implementation • 26 May 2023 • Atli Kosson, Dongyang Fan, Martin Jaggi

Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.

Paper
Code

Adaptive Braking for Mitigating Gradient Delay

no code implementations • 2 Jul 2020 • Abhinav Venigalla, Atli Kosson, Vitaliy Chiley, Urs Köster

Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel.

Paper
Add Code

Pipelined Backpropagation at Scale: Training Large Models without Batches

no code implementations • 25 Mar 2020 • Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster

New hardware can substantially increase the speed and efficiency of deep neural network training.

Image Classification Stochastic Optimization

Paper
Add Code

Online Normalization for Training Neural Networks

1 code implementation • NeurIPS 2019 • Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James

Online Normalization is a new technique for normalizing the hidden activations of a neural network.

General Classification Image Classification +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.