no code implementations • ICML 2020 • Karthik Abinav Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein
Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.
1 code implementation • 11 Apr 2024 • Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture.
no code implementations • 13 Mar 2024 • Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.
2 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.
no code implementations • 25 Oct 2023 • Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De
Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.
2 code implementations • 21 Aug 2023 • Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle
The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.
no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith
Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.
8 code implementations • 11 Mar 2023 • Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.
no code implementations • 27 Feb 2023 • Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle
By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.
2 code implementations • 28 Apr 2022 • Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
Classification Image Classification with Differential Privacy +1
no code implementations • 7 Mar 2022 • Aleksander Botev, Matthias Bauer, Soham De
Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations.
no code implementations • 31 May 2021 • Tudor Berariu, Wojciech Czarnecki, Soham De, Jorg Bornschein, Samuel Smith, Razvan Pascanu, Claudia Clopath
One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task.
no code implementations • 27 May 2021 • Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith
In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.
Ranked #124 on Image Classification on ImageNet
19 code implementations • 11 Feb 2021 • Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.
Ranked #31 on Image Classification on ImageNet
no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De
To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.
4 code implementations • ICLR 2021 • Andrew Brock, Soham De, Samuel L. Smith
Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.
3 code implementations • 20 Oct 2020 • Pierre H. Richemond, Jean-bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation.
no code implementations • ICML 2020 • Samuel L. Smith, Erich Elsen, Soham De
It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.
no code implementations • NeurIPS 2020 • Soham De, Samuel L. Smith
Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.
no code implementations • 25 Sep 2019 • Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein
Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.
no code implementations • 25 Sep 2019 • Samuel L Smith, Erich Elsen, Soham De
First, we argue that stochastic gradient descent exhibits two regimes with different behaviours; a noise dominated regime which typically arises for small or moderate batch sizes, and a curvature dominated regime which typically arises when the batch size is large.
no code implementations • 25 Sep 2019 • Soham De, Samuel L Smith
This initialization scheme outperforms batch normalization when the batch size is very small, and is competitive with batch normalization for batch sizes that are not too large.
no code implementations • NeurIPS 2019 • Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli
Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack.
no code implementations • 15 Apr 2019 • Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein
Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training.
no code implementations • ICLR 2019 • Soham De, Anirbit Mukherjee, Enayat Ullah
Through these experiments we demonstrate the interesting sensitivity that ADAM has to its momentum parameter $\beta_1$.
no code implementations • NeurIPS 2017 • Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein
Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems.
no code implementations • 9 Jan 2017 • Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein
This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image.
no code implementations • 10 Dec 2016 • Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, Tom Goldstein
The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems.
no code implementations • 18 Oct 2016 • Soham De, Abhay Yadav, David Jacobs, Tom Goldstein
The high fidelity gradients enable automated learning rate selection and do not require stepsize decay.
no code implementations • 9 Dec 2015 • Soham De, Tom Goldstein
Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets.
no code implementations • 5 Dec 2015 • Soham De, Gavin Taylor, Tom Goldstein
Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates.
no code implementations • 15 Oct 2015 • Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Goldstein, Gavin Taylor
In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points.
no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj
Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.