1 code implementation • 9 Jan 2024 • Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos
Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.
no code implementations • 5 Dec 2023 • Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani
Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.
1 code implementation • NeurIPS 2023 • Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos
Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation.
1 code implementation • 16 Mar 2023 • Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos
Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.
no code implementations • 30 Jan 2023 • Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial.
no code implementations • 19 Oct 2022 • Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos
When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.
3 code implementations • 29 Jun 2022 • Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos
Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.
5 code implementations • 10 Mar 2022 • Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.
Ranked #1 on Image Classification on ImageNet V2 (using extra training data)
no code implementations • NeurIPS Workshop ImageNet_PPF 2021 • Chaitanya Ryali, David J. Schwab, Ari S. Morcos
Through a systematic, comprehensive investigation, we show that background augmentations lead to improved generalization with substantial improvements ($\sim$1-2% on ImageNet) in performance across a spectrum of state-of-the-art self-supervised methods (MoCo-v2, BYOL, SwAV) on a variety of tasks, even enabling performance on par with the supervised baseline.
1 code implementation • NeurIPS 2021 • Diane Bouchacourt, Mark Ibrahim, Ari S. Morcos
While prior work has focused on synthetic data, we attempt here to characterize the factors of variation in a real dataset, ImageNet, and study the invariance of both standard residual networks and the recently proposed vision transformer with respect to changes in these factors.
1 code implementation • NeurIPS 2021 • Diane Bouchacourt, Mark Ibrahim, Ari S. Morcos
While prior work has focused on synthetic data, we attempt here to characterize the factors of variation in a real dataset, ImageNet, and study the invariance of both standard residual networks and the recently proposed vision transformer with respect to changes in these factors.
no code implementations • 24 Apr 2021 • Ting-Wu Chin, Diana Marculescu, Ari S. Morcos
In this work, we propose width transfer, a technique that harnesses the assumptions that the optimized widths (or channel counts) are regular across sizes and depths.
no code implementations • 23 Mar 2021 • Chaitanya K. Ryali, David J. Schwab, Ari S. Morcos
Recent progress in self-supervised learning has demonstrated promising results in multiple visual tasks.
Ranked #83 on Image Classification on ObjectNet (using extra training data)
no code implementations • 1 Jan 2021 • Janice Lan, Rudy Chin, Alexei Baevski, Ari S. Morcos
However, prior work has implicitly assumed that the best training configuration for model performance was also the best configuration for mask discovery.
no code implementations • ACL 2021 • Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.
no code implementations • 13 Oct 2020 • Tiffany Tianhui Cai, Jonathan Frankle, David J. Schwab, Ari S. Morcos
Using methodology from MoCo v2 (Chen et al., 2020), we divided negatives by their difficulty for a given query and studied which difficulty ranges were most important for learning useful representations.
no code implementations • 28 Sep 2020 • Matthew L Leavitt, Ari S. Morcos
We also found that the input-unit gradient was more variable across samples and units in high-selectivity networks compared to low-selectivity networks.
no code implementations • 28 Sep 2020 • Rudy Chin, Ari S. Morcos, Diana Marculescu
Slimmable neural networks provide a flexible trade-off front between prediction error and computational cost (such as the number of floating-point operations or FLOPs) with the same storage cost as a single model.
2 code implementations • 23 Jul 2020 • Ting-Wu Chin, Ari S. Morcos, Diana Marculescu
In this work, we propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks.
no code implementations • 8 Jul 2020 • Matthew L. Leavitt, Ari S. Morcos
While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information.
1 code implementation • 7 May 2020 • Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra
In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.
4 code implementations • ICLR 2021 • Jonathan Frankle, David J. Schwab, Ari S. Morcos
A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features.
1 code implementation • ICLR 2020 • Jonathan Frankle, David J. Schwab, Ari S. Morcos
We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset.
no code implementations • NeurIPS 2020 • Brian R. Bartoldson, Ari S. Morcos, Adrian Barbu, Gordon Erlebacher
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting.
no code implementations • ICLR 2020 • Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos
The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019).
2 code implementations • NeurIPS 2019 • Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian
Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset.
2 code implementations • ICLR 2019 • Felix Hill, Adam Santoro, David G. T. Barrett, Ari S. Morcos, Timothy Lillicrap
Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data.
no code implementations • 31 Oct 2018 • David G. T. Barrett, Ari S. Morcos, Jakob H. Macke
We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.
2 code implementations • ICML 2018 • David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap
To succeed at this challenge, models must cope with various generalisation `regimes' in which the training and test data differ in clearly-defined ways.
no code implementations • 3 Jul 2018 • Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel
Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games.
2 code implementations • NeurIPS 2018 • Ari S. Morcos, Maithra Raghu, Samy Bengio
Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training.
no code implementations • ICLR 2019 • Avraham Ruderman, Neil C. Rabinowitz, Ari S. Morcos, Daniel Zoran
In this work, we rigorously test these questions, and find that deformation stability in convolutional networks is more nuanced than it first appears: (1) Deformation invariance is not a binary property, but rather that different tasks require different degrees of deformation stability at different layers.
1 code implementation • ICLR 2018 • Ari S. Morcos, David G. T. Barrett, Neil C. Rabinowitz, Matthew Botvinick
Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.