Search Results for author: Andrea Montanari

Found 75 papers, 8 papers with code

Scaling laws for learning with real and surrogate data

no code implementations6 Feb 2024 Ayush Jain, Andrea Montanari, Eren Sasoglu

Collecting large quantities of high-quality data is often prohibitively expensive or impractical, and a crucial bottleneck in machine learning.

Universality of max-margin classifiers

no code implementations29 Sep 2023 Andrea Montanari, Feng Ruan, Basil Saeed, Youngtak Sohn

Working in the high-dimensional regime in which the number of features $p$, the number of samples $n$ and the input dimension $d$ (in the nonlinear featurization setting) diverge, with ratios of order one, we prove a universality result establishing that the asymptotic behavior is completely determined by the expected covariance of feature vectors and by the covariance between features and labels.

Binary Classification

Towards a statistical theory of data selection under weak supervision

no code implementations25 Sep 2023 Germain Kolossov, Andrea Montanari, Pulkit Tandon

Given a sample of size $N$, it is often useful to select a subsample of smaller size $n<N$ to be used for statistical estimation or learning.

Six Lectures on Linearized Neural Networks

no code implementations25 Aug 2023 Theodor Misiakiewicz, Andrea Montanari

In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models.

regression

Sampling, Diffusions, and Stochastic Localization

no code implementations18 May 2023 Andrea Montanari

Diffusions are a successful technique to sample from high-dimensional distributions can be either explicitly given or learnt from a collection of samples.

Denoising

Learning time-scales in two-layers neural networks

no code implementations28 Feb 2023 Raphaël Berthier, Andrea Montanari, Kangjie Zhou

In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i. e., the target function depends on a one-dimensional projection of the covariates).

Vocal Bursts Valence Prediction

Dimension free ridge regression

no code implementations16 Oct 2022 Chen Cheng, Andrea Montanari

However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows proportionally to the number of rows of the data matrix.

regression

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

no code implementations14 Jun 2022 Andrea Montanari, Kangjie Zhou

Denoting by $\mathscr{F}_{m, \alpha}$ the set of probability distributions in $\mathbb{R}^m$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $\mathscr{F}_{m, \alpha}$.

Adversarial Examples in Random Neural Networks with General Activations

no code implementations31 Mar 2022 Andrea Montanari, Yuchen Wu

A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples.

Universality of empirical risk minimization

no code implementations17 Feb 2022 Andrea Montanari, Basil Saeed

In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance.

Tractability from overparametrization: The example of the negative perceptron

no code implementations28 Oct 2021 Andrea Montanari, Yiqiao Zhong, Kangjie Zhou

In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i, y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label.

Minimum complexity interpolation in random features models

no code implementations30 Mar 2021 Michael Celentano, Theodor Misiakiewicz, Andrea Montanari

We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size.

Deep learning: a statistical viewpoint

no code implementations16 Mar 2021 Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin

We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting.

Learning with invariances in random features and kernel models

no code implementations25 Feb 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.

Data Augmentation

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

no code implementations26 Jan 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.

regression

The Lasso with general Gaussian designs with applications to hypothesis testing

no code implementations27 Jul 2020 Michael Celentano, Andrea Montanari, Yuting Wei

On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one.

Two-sample testing valid

When Do Neural Networks Outperform Kernel Methods?

1 code implementation NeurIPS 2020 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

The estimation error of general first order methods

no code implementations28 Feb 2020 Michael Celentano, Andrea Montanari, Yuchen Wu

These lower bounds are optimal in the sense that there exist algorithms whose estimation error matches the lower bounds up to asymptotically negligible terms.

Retrieval

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation NeurIPS 2019 Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

The generalization error of random features regression: Precise asymptotics and double descent curve

no code implementations14 Aug 2019 Song Mei, Andrea Montanari

We compute the precise asymptotics of the test error, in the limit $N, n, d\to \infty$ with $N/d$ and $n/d$ fixed.

regression

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation21 Jun 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

Linearized two-layers neural networks in high dimension

no code implementations27 Apr 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

no code implementations19 Mar 2019 Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type.

Vocal Bursts Intensity Prediction

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

no code implementations16 Feb 2019 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.

Analysis of a Two-Layer Neural Network via Displacement Convexity

no code implementations5 Jan 2019 Adel Javanmard, Marco Mondelli, Andrea Montanari

We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$.

Vocal Bursts Valence Prediction

Contextual Stochastic Block Models

no code implementations NeurIPS 2018 Yash Deshpande, Andrea Montanari, Elchanan Mossel, Subhabrata Sen

We provide the first information theoretic tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities.

A Mean Field View of the Landscape of Two-Layers Neural Networks

no code implementations18 Apr 2018 Song Mei, Andrea Montanari, Phan-Minh Nguyen

Does SGD converge to a global optimum of the risk or only to a local optimum?

On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition

no code implementations20 Feb 2018 Marco Mondelli, Andrea Montanari

Our conclusion holds for a `natural data distribution', namely standard Gaussian feature vectors $\boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition.

Tensor Decomposition

An Instability in Variational Inference for Topic Models

no code implementations2 Feb 2018 Behrooz Ghorbani, Hamid Javadi, Andrea Montanari

Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics.

Topic Models Variational Inference

The landscape of the spiked tensor model

no code implementations15 Nov 2017 Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica

We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate.

Estimation of Low-Rank Matrices via Approximate Message Passing

1 code implementation6 Nov 2017 Andrea Montanari, Ramji Venkataramanan

In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold.

Community Detection

Inference in Graphical Models via Semidefinite Programming Hierarchies

no code implementations NeurIPS 2017 Murat A. Erdogdu, Yash Deshpande, Andrea Montanari

We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses.

Combinatorial Optimization Computational Efficiency +1

Learning Combinations of Sigmoids Through Gradient Estimation

no code implementations22 Aug 2017 Stratis Ioannidis, Andrea Montanari

In a nutshell, we estimate the gradient of the regression function at a set of random points, and cluster the estimated gradients.

regression

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

no code implementations20 Aug 2017 Marco Mondelli, Andrea Montanari

In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise.

Retrieval

Non-negative Matrix Factorization via Archetypal Analysis

no code implementations8 May 2017 Hamid Javadi, Andrea Montanari

In this paper, we study an approach to NMF that can be traced back to the work of Cutler and Breiman (1994) and does not require the data to be separable, while providing a generally unique decomposition.

Spectral algorithms for tensor completion

no code implementations23 Dec 2016 Andrea Montanari, Nike Sun

In the tensor completion problem, one seeks to estimate a low-rank tensor based on a random sample of revealed entries.

How Well Do Local Algorithms Solve Semidefinite Programs?

no code implementations17 Oct 2016 Zhou Fan, Andrea Montanari

Several probabilistic models from high-dimensional statistics and machine learning reveal an intriguing --and yet poorly understood-- dichotomy.

The Landscape of Empirical Risk for Non-convex Losses

no code implementations22 Jul 2016 Song Mei, Yu Bai, Andrea Montanari

We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).

Binary Classification General Classification +1

Performance of a community detection algorithm based on semidefinite programming

no code implementations30 Mar 2016 Adel Javanmard, Andrea Montanari, Federico Ricci-Tersenghi

In this paper we study in detail several practical aspects of this new algorithm based on semidefinite programming for the detection of the planted partition.

Community Detection Stochastic Block Model

A Grothendieck-type inequality for local maxima

no code implementations13 Mar 2016 Andrea Montanari

A large number of problems in optimization, machine learning, signal processing can be effectively addressed by suitable semidefinite programming (SDP) relaxations.

Vocal Bursts Type Prediction

On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors

no code implementations NeurIPS 2015 Andrea Montanari, Daniel Reichman, Ofer Zeitouni

We consider the following detection problem: given a realization of asymmetric matrix $X$ of dimension $n$, distinguish between the hypothesisthat all upper triangular variables are i. i. d.

Convergence rates of sub-sampled Newton methods

no code implementations NeurIPS 2015 Murat A. Erdogdu, Andrea Montanari

In this regime, algorithms which utilize sub-sampling techniques are known to be effective.

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

no code implementations11 Aug 2015 Adel Javanmard, Andrea Montanari

When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$.

Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems

no code implementations23 Feb 2015 Yash Deshpande, Andrea Montanari

Here we consider the degree-$4$ SOS relaxation, and study the construction of \cite{meka2013association} to prove that SOS fails unless $k\ge C\, n^{1/3}/\log n$.

Two-sample testing

On Online Control of False Discovery Rate

1 code implementation22 Feb 2015 Adel Javanmard, Andrea Montanari

Given a sequence of null hypotheses $\mathcal{H}(n) = (H_1,..., H_n)$, Benjamini and Hochberg \cite{benjamini1995controlling} introduced the false discovery rate (FDR) criterion, which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level.

Finding One Community in a Sparse Graph

no code implementations19 Feb 2015 Andrea Montanari

This can be regarded as a model for the problem of finding a tightly knitted community in a social network, or a cluster in a relational dataset.

A statistical model for tensor PCA

no code implementations NeurIPS 2014 Andrea Montanari, Emile Richard

This is possibly related to a fundamental limitation of computationally tractable estimators for this problem.

Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques

no code implementations19 Sep 2014 Eric W. Tramel, Santhosh Kumar, Andrei Giurgiu, Andrea Montanari

These notes review six lectures given by Prof. Andrea Montanari on the topic of statistical estimation for linear models.

Denoising regression

Computational Implications of Reducing Data to Sufficient Statistics

no code implementations12 Sep 2014 Andrea Montanari

Given a large dataset and an estimation task, it is common to pre-process the data by reducing them to a set of sufficient statistics.

Privacy Tradeoffs in Predictive Analytics

no code implementations31 Mar 2014 Stratis Ioannidis, Andrea Montanari, Udi Weinsberg, Smriti Bhagat, Nadia Fawaz, Nina Taft

Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data.

Attribute Privacy Preserving

Estimating LASSO Risk and Noise Level

no code implementations NeurIPS 2013 Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

In this context, we develop new estimators for the $\ell_2$ estimation risk $\|\hat{\theta}-\theta_0\|_2$ and the variance of the noise.

Denoising

Sparse PCA via Covariance Thresholding

no code implementations NeurIPS 2014 Yash Deshpande, Andrea Montanari

In an influential paper, \cite{johnstone2004sparse} introduced a simple algorithm that estimates the support of the principal vectors $\mathbf{v}_1,\dots,\mathbf{v}_r$ by the largest entries in the diagonal of the empirical covariance.

Learning Mixtures of Linear Classifiers

no code implementations11 Nov 2013 Yuekai Sun, Stratis Ioannidis, Andrea Montanari

We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers.

regression

Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression

no code implementations1 Nov 2013 Adel Javanmard, Andrea Montanari

In the regime where the number of parameters $p$ is comparable to or exceeds the sample size $n$, a successful approach uses an $\ell_1$-penalized least squares estimator, known as Lasso.

regression Two-sample testing +1

Confidence Intervals and Hypothesis Testing for High-Dimensional Regression

no code implementations NeurIPS 2013 Adel Javanmard, Andrea Montanari

This in turn implies that it is extremely challenging to quantify the \emph{uncertainty} associated with a certain parameter estimate.

regression Two-sample testing +1

Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

no code implementations NeurIPS 2013 Adel Javanmard, Andrea Montanari

In the high-dimensional regression model a response variable is linearly related to $p$ covariates, but the sample size $n$ is smaller than $p$.

Model Selection regression +1

Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

no code implementations17 Jan 2013 Adel Javanmard, Andrea Montanari

In this case we prove that a similar distributional characterization (termed `standard distributional limit') holds for $n$ much larger than $s_0(\log p)^2$.

Model Selection regression +1

Accelerated Time-of-Flight Mass Spectrometry

no code implementations18 Dec 2012 Morteza Ibrahimi, Andrea Montanari, George S Moore

We study a simple modification to the conventional time of flight mass spectrometry (TOFMS) where a \emph{variable} and (pseudo)-\emph{random} pulsing rate is used which allows for traces from different pulses to overlap.

The Noise-Sensitivity Phase Transition in Compressed Sensing

1 code implementation8 Apr 2010 David L. Donoho, Arian Maleki, Andrea Montanari

We develop formal expressions for the MSE of \hxl, and evaluate its worst-case formal noise sensitivity over all types of k-sparse signals.

Statistics Theory Information Theory Information Theory Statistics Theory

Which graphical models are difficult to learn?

no code implementations NeurIPS 2009 Andrea Montanari, Jose A. Pereira

We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i. i. d.

Matrix Completion from Noisy Entries

1 code implementation NeurIPS 2009 Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries.

Collaborative Filtering Matrix Completion

Matrix Completion from a Few Entries

1 code implementation20 Jan 2009 Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.

Matrix Completion

Solving Constraint Satisfaction Problems through Belief Propagation-guided decimation

no code implementations11 Sep 2007 Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian

Message passing algorithms have proved surprisingly successful in solving hard constraint satisfaction problems on sparse random graphs.

Cannot find the paper you are looking for? You can Submit a new open access paper.