1 code implementation • 14 Feb 2024 • Clayton Sanford, Daniel Hsu, Matus Telgarsky
We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation.
no code implementations • 27 Oct 2022 • Alberto Bietti, Joan Bruna, Clayton Sanford, Min Jae Song
Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input.
1 code implementation • 11 Oct 2022 • Vaggos Chatziafratis, Ioannis Panageas, Clayton Sanford, Stelios Andrew Stavroulakis
Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics, and their sensitivity to the initialization process often renders them notoriously hard to train.
1 code implementation • 10 Jun 2022 • Navid Ardeshir, Daniel Hsu, Clayton Sanford
We study the structural and statistical properties of $\mathcal{R}$-norm minimizing interpolants of datasets labeled by specific target functions.
no code implementations • 10 Feb 2022 • Daniel Hsu, Clayton Sanford, Rocco Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis
This lower bound is essentially best possible since an SQ algorithm of Klivans et al. (2008) agnostically learns this class to any constant excess error using $n^{O(\log k)}$ queries of tolerance $n^{-O(\log k)}$.
no code implementations • 19 Oct 2021 • Clayton Sanford, Vaggos Chatziafratis
Given a target function $f$, how large must a neural network be in order to approximate $f$?
1 code implementation • NeurIPS 2021 • Navid Ardeshir, Clayton Sanford, Daniel Hsu
The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector.
no code implementations • 3 Feb 2021 • Daniel Hsu, Clayton Sanford, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions?
no code implementations • 18 Dec 2018 • Clayton Sanford, Cyrus Cousins, Eli Upfal
We frame the problem of selecting an optimal audio encoding scheme as a supervised learning task.