no code implementations • 23 Nov 2023 • Cyrus Zhou, Vaughn Richard, Pedro Savarese, Zachary Hassman, Michael Maire, Michael DiBrino, Yanjing Li
The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.
no code implementations • NeurIPS 2021 • Sudarshan Babu, Pedro Savarese, Michael Maire
We demonstrate that efficient meta-learning can be achieved via end-to-end training of deep neural networks with memory distributed across layers.
1 code implementation • CVPR 2021 • Pedro Savarese, Sunnie S. Y. Kim, Michael Maire, Greg Shakhnarovich, David Mcallester
We study image segmentation from an information-theoretic perspective, proposing a novel adversarial method that performs unsupervised segmentation by partitioning images into maximally independent sets.
Ranked #1 on Unsupervised Image Segmentation on Flowers
no code implementations • ICLR 2021 • Xin Yuan, Pedro Savarese, Michael Maire
We develop an approach to growing deep network architectures over the course of training, driven by a principled combination of accuracy and sparsity objectives.
1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro
We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
2 code implementations • NeurIPS 2020 • Pedro Savarese, Hugo Silva, Michael Maire
Additionally, the recent Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks which, when trained from scratch on a comparable budget, match the performance of the original dense counterpart.
1 code implementation • CVPR 2021 • Pedro Savarese, David Mcallester, Sudarshan Babu, Michael Maire
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned.
no code implementations • 13 Aug 2019 • Daniel Specht Menezes, Pedro Savarese, Ruy Luiz Milidiú
With the recent progress in machine learning, boosted by techniques such as deep learning, many tasks can be successfully solved once a large enough dataset is available for training.
1 code implementation • 13 Aug 2019 • Pedro Savarese
Adaptive gradient methods such as Adam have gained extreme popularity due to their success in training complex neural networks and less sensitivity to hyperparameter tuning compared to SGD.
1 code implementation • 13 Jun 2019 • Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro
A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.
1 code implementation • ICLR 2019 • Pedro Savarese, Michael Maire
Restricting the number of templates yields a flexible hybridization of traditional CNNs and recurrent networks.
no code implementations • 13 Feb 2019 • Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro
We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function.