no code implementations • 24 Jun 2023 • Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo
Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average.
no code implementations • 24 Jun 2023 • Alycia Lee, Brando Miranda, Sudharsan Sundar, Sanmi Koyejo
Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size.
no code implementations • NeurIPS 2023 • Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models.
no code implementations • 15 Mar 2023 • Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager
Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi.
no code implementations • 2 Aug 2022 • Brando Miranda, Patrick Yu, Yu-Xiong Wang, Sanmi Koyejo
This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison.
no code implementations • 24 Dec 2021 • Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo
We hypothesize that the diversity coefficient of the few-shot learning benchmark is predictive of whether meta-learning solutions will succeed or not.
1 code implementation • 24 Dec 2021 • Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo
Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks.
no code implementations • 12 Mar 2019 • Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio
In particular, gradient descent induces a dynamics of the normalized weights which converge for $t \to \infty$ to an equilibrium which corresponds to a minimum norm (or maximum margin) solution.
3 code implementations • 25 Jul 2018 • Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors?
no code implementations • 29 Jun 2018 • Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary
Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss.
no code implementations • 7 Jan 2018 • Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.
no code implementations • 30 Dec 2017 • Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.
no code implementations • 2 Nov 2016 • Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning.