Search Results for author: Theodor Misiakiewicz

Found 18 papers, 3 papers with code

Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

no code implementations • 13 Mar 2024 • Hong Hu, Yue M. Lu, Theodor Misiakiewicz

On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$).

regression

Paper
Add Code

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator

no code implementations • 13 Mar 2024 • Theodor Misiakiewicz, Basil Saeed

Specifically, we establish in this setting a non-asymptotic deterministic approximation for the test error of KRR -- with explicit non-asymptotic bounds -- that only depends on the eigenvalues and the target function alignment to the eigenvectors of the kernel.

Paper
Add Code

Six Lectures on Linearized Neural Networks

no code implementations • 25 Aug 2023 • Theodor Misiakiewicz, Andrea Montanari

In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models.

regression

Paper
Add Code

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

no code implementations • 21 Feb 2023 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.

Paper
Add Code

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

no code implementations • 30 May 2022 • Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes.

regression

Paper
Add Code

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

no code implementations • 21 Apr 2022 • Theodor Misiakiewicz

In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial.

Paper
Add Code

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

no code implementations • 17 Feb 2022 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.

Paper
Add Code

Learning with convolution and pooling operations in kernel methods

no code implementations • 16 Nov 2021 • Theodor Misiakiewicz, Song Mei

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.

Image Classification

Paper
Add Code

Minimum complexity interpolation in random features models

no code implementations • 30 Mar 2021 • Michael Celentano, Theodor Misiakiewicz, Andrea Montanari

We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size.

Paper
Add Code

Learning with invariances in random features and kernel models

no code implementations • 25 Feb 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.

Data Augmentation

Paper
Add Code

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

no code implementations • 26 Jan 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.

regression

Paper
Add Code

When Do Neural Networks Outperform Kernel Methods?

1 code implementation • NeurIPS 2020 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

Paper
Code

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation • NeurIPS 2019 • Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

Paper
Code

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation • 21 Jun 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

Paper
Code

Linearized two-layers neural networks in high dimension

no code implementations • 27 Apr 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

Paper
Add Code

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

no code implementations • 16 Feb 2019 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.

Paper
Add Code

Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

no code implementations • 25 Mar 2017 • Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira

In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems.

Stochastic Block Model

Paper
Add Code

Efficient reconstruction of transmission probabilities in a spreading process from partial observations

no code implementations • 23 Sep 2015 • Andrey Y. Lokhov, Theodor Misiakiewicz

A number of recent papers introduced efficient algorithms for the estimation of spreading parameters, based on the maximization of the likelihood of observed cascades, assuming that the full information for all the nodes in the network is available.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.