Search Results for author: Theodor Misiakiewicz

Found 18 papers, 3 papers with code

Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

no code implementations13 Mar 2024 Hong Hu, Yue M. Lu, Theodor Misiakiewicz

On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$).

regression

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator

no code implementations13 Mar 2024 Theodor Misiakiewicz, Basil Saeed

Specifically, we establish in this setting a non-asymptotic deterministic approximation for the test error of KRR -- with explicit non-asymptotic bounds -- that only depends on the eigenvalues and the target function alignment to the eigenvectors of the kernel.

Six Lectures on Linearized Neural Networks

no code implementations25 Aug 2023 Theodor Misiakiewicz, Andrea Montanari

In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models.

regression

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

no code implementations21 Feb 2023 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

no code implementations30 May 2022 Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes.

regression

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

no code implementations21 Apr 2022 Theodor Misiakiewicz

In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial.

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

no code implementations17 Feb 2022 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.

Learning with convolution and pooling operations in kernel methods

no code implementations16 Nov 2021 Theodor Misiakiewicz, Song Mei

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.

Image Classification

Minimum complexity interpolation in random features models

no code implementations30 Mar 2021 Michael Celentano, Theodor Misiakiewicz, Andrea Montanari

We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size.

Learning with invariances in random features and kernel models

no code implementations25 Feb 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.

Data Augmentation

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

no code implementations26 Jan 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.

regression

When Do Neural Networks Outperform Kernel Methods?

1 code implementation NeurIPS 2020 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation NeurIPS 2019 Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation21 Jun 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

Linearized two-layers neural networks in high dimension

no code implementations27 Apr 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

no code implementations16 Feb 2019 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.

Efficient reconstruction of transmission probabilities in a spreading process from partial observations

no code implementations23 Sep 2015 Andrey Y. Lokhov, Theodor Misiakiewicz

A number of recent papers introduced efficient algorithms for the estimation of spreading parameters, based on the maximization of the likelihood of observed cascades, assuming that the full information for all the nodes in the network is available.

Cannot find the paper you are looking for? You can Submit a new open access paper.