Search Results for author: Soham De

Found 33 papers, 8 papers with code

Analyzing the effect of neural network architecture on training performance

no code implementations • ICML 2020 • Karthik Abinav Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.

Paper
Add Code

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

1 code implementation • 11 Apr 2024 • Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas

We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture.

Language Modelling

498

Paper
Code

Gemma: Open Models Based on Gemini Research and Technology

no code implementations • 13 Mar 2024 • Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

Paper
Add Code

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

2 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.

Language Modelling

498

Paper
Code

ConvNets Match Vision Transformers at Scale

no code implementations • 25 Oct 2023 • Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.

Paper
Add Code

Unlocking Accuracy and Fairness in Differentially Private Image Classification

2 code implementations • 21 Aug 2023 • Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.

Classification Fairness +2

Paper
Code

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

Paper
Add Code

Resurrecting Recurrent Neural Networks for Long Sequences

8 code implementations • 11 Mar 2023 • Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.

Ranked #3 on Sequential Image Classification on Sequential CIFAR-10

Classification Computational Efficiency +1

Paper
Code

Differentially Private Diffusion Models Generate Useful Synthetic Images

no code implementations • 27 Feb 2023 • Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.

Image Generation Privacy Preserving

Paper
Add Code

Unlocking High-Accuracy Differentially Private Image Classification through Scale

2 code implementations • 28 Apr 2022 • Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.

Ranked #2 on Image Classification with Differential Privacy on ImageNet

Classification Image Classification with Differential Privacy +1

Paper
Code

Regularising for invariance to data augmentation improves supervised learning

no code implementations • 7 Mar 2022 • Aleksander Botev, Matthias Bauer, Soham De

Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations.

Data Augmentation

Paper
Add Code

A study on the plasticity of neural networks

no code implementations • 31 May 2021 • Tudor Berariu, Wojciech Czarnecki, Soham De, Jorg Bornschein, Samuel Smith, Razvan Pascanu, Claudia Clopath

One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task.

Continual Learning Transfer Learning

Paper
Add Code

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

no code implementations • 27 May 2021 • Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.

Ranked #124 on Image Classification on ImageNet

Data Augmentation Image Classification

Paper
Add Code

High-Performance Large-Scale Image Recognition Without Normalization

19 code implementations • 11 Feb 2021 • Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.

Ranked #31 on Image Classification on ImageNet

Image Classification Vocal Bursts Intensity Prediction

29,774

Paper
Code

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Paper
Add Code

Characterizing signal propagation to close the performance gap in unnormalized ResNets

4 code implementations • ICLR 2021 • Andrew Brock, Soham De, Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.

29,774

Paper
Code

BYOL works even without batch statistics

3 code implementations • 20 Oct 2020 • Pierre H. Richemond, Jean-bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation.

Self-Supervised Learning

1,687

Paper
Code

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations • ICML 2020 • Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Paper
Add Code

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

no code implementations • NeurIPS 2020 • Soham De, Samuel L. Smith

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.

Paper
Add Code

The Effect of Neural Net Architecture on Gradient Confusion & Training Performance

no code implementations • 25 Sep 2019 • Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.

Paper
Add Code

Hyperparameter Tuning and Implicit Regularization in Minibatch SGD

no code implementations • 25 Sep 2019 • Samuel L Smith, Erich Elsen, Soham De

First, we argue that stochastic gradient descent exhibits two regimes with different behaviours; a noise dominated regime which typically arises for small or moderate batch sizes, and a curvature dominated regime which typically arises when the batch size is large.

Paper
Add Code

Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks

no code implementations • 25 Sep 2019 • Soham De, Samuel L Smith

This initialization scheme outperforms batch normalization when the batch size is very small, and is competitive with batch normalization for batch sizes that are not too large.

Paper
Add Code

Adversarial Robustness through Local Linearization

no code implementations • NeurIPS 2019 • Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli

Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack.

Ranked #2 on Adversarial Defense on ImageNet (non-targeted PGD, max perturbation=4)

Adversarial Defense Adversarial Robustness

Paper
Add Code

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

no code implementations • 15 Apr 2019 • Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training.

Paper
Add Code

Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration

no code implementations • ICLR 2019 • Soham De, Anirbit Mukherjee, Enayat Ullah

Through these experiments we demonstrate the interesting sensitivity that ADAM has to its momentum parameter $\beta_1$.

Paper
Add Code

Training Quantized Nets: A Deeper Understanding

no code implementations • NeurIPS 2017 • Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems.

Paper
Add Code

Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation

no code implementations • 9 Jan 2017 • Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein

This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image.

LEMMA Object +2

Paper
Add Code

An Empirical Study of ADMM for Nonconvex Problems

no code implementations • 10 Dec 2016 • Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, Tom Goldstein

The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems.

Image Denoising regression +1

Paper
Add Code

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

no code implementations • 18 Oct 2016 • Soham De, Abhay Yadav, David Jacobs, Tom Goldstein

The high fidelity gradients enable automated learning rate selection and do not require stepsize decay.

Paper
Add Code

Efficient Distributed SGD with Variance Reduction

no code implementations • 9 Dec 2015 • Soham De, Tom Goldstein

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets.

Stochastic Optimization

Paper
Add Code

Variance Reduction for Distributed Stochastic Gradient Descent

no code implementations • 5 Dec 2015 • Soham De, Gavin Taylor, Tom Goldstein

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates.

Stochastic Optimization

Paper
Add Code

Layer-Specific Adaptive Learning Rates for Deep Networks

no code implementations • 15 Oct 2015 • Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Goldstein, Gavin Taylor

In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points.

Image Classification

Paper
Add Code

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.