no code implementations • 14 Apr 2024 • Jin-Hong Du, Zhenghao Zeng, Edward H. Kennedy, Larry Wasserman, Kathryn Roeder
In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available.
1 code implementation • 22 Mar 2024 • Alec McClean, Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman
Then, assuming the nuisance functions are H\"{o}lder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime.
no code implementations • 29 Feb 2024 • Ilmun Kim, Larry Wasserman, Sivaraman Balakrishnan, Matey Neykov
Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming.
1 code implementation • 13 Sep 2023 • Jin-Hong Du, Larry Wasserman, Kathryn Roeder
Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes.
no code implementations • 6 May 2023 • Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman
These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces.
no code implementations • 10 Mar 2023 • Isabella Verdinelli, Larry Wasserman
We are particularly interested in the effect of correlation between features which can obscure interpretability.
no code implementations • 21 Dec 2021 • James Leiner, Boyan Duan, Larry Wasserman, Aaditya Ramdas
Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian.
no code implementations • 21 Nov 2021 • Isabella Verdinelli, Larry Wasserman
We propose a method for mitigating the effect of correlation by defining a modified version of LOCO.
2 code implementations • 17 Nov 2021 • Robin Dunn, Aditya Gangrade, Larry Wasserman, Aaditya Ramdas
Shape constraints yield flexible middle grounds between fully nonparametric and fully parametric approaches to modeling distributions of data.
1 code implementation • 26 Jul 2021 • Tudor Manole, Sivaraman Balakrishnan, Jonathan Niles-Weed, Larry Wasserman
Our work also provides new bounds on the risk of corresponding plugin estimators for the quadratic Wasserstein distance, and we show how this problem relates to that of estimating optimal transport maps using stability arguments for smooth and strongly convex Brenier potentials.
no code implementations • 8 Mar 2021 • Isabella Verdinelli, Larry Wasserman
We use the output of a random forest to define a family of local smoothers with spatially adaptive bandwidth matrices.
no code implementations • 15 Feb 2021 • Purvasha Chakravarti, Mikael Kuusela, Jing Lei, Larry Wasserman
Here we instead investigate a model-independent method that does not make any assumptions about the signal and uses a semi-supervised classifier to detect the presence of the signal in the experimental data.
Applications High Energy Physics - Phenomenology Data Analysis, Statistics and Probability
1 code implementation • NeurIPS 2020 • Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, Frederic Chazal, Larry Wasserman
We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.
no code implementations • 26 Jun 2020 • Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.
1 code implementation • ICML 2020 • Boyan Duan, Aaditya Ramdas, Larry Wasserman
We propose a method for multiple hypothesis testing with familywise error rate (FWER) control, called the i-FWER test.
Methodology
2 code implementations • NeurIPS 2020 • Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman
We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.
Video-based Generative Performance Benchmarking (Contextual Understanding)
4 code implementations • 10 Jan 2020 • Collin A. Politsch, Jessi Cisewski-Kehe, Rupert A. C. Croft, Larry Wasserman
The remaining studies share broad themes of: (1) estimating observable parameters of light curves and spectra; and (2) constructing observational spectral/light-curve templates.
Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics Earth and Planetary Astrophysics Solar and Stellar Astrophysics Applications
no code implementations • 24 Dec 2019 • Larry Wasserman, Aaditya Ramdas, Sivaraman Balakrishnan
Constructing tests and confidence sets for such models is notoriously difficult.
no code implementations • 7 Oct 2019 • Purvasha Chakravarti, Sivaraman Balakrishnan, Larry Wasserman
We consider clustering based on significance tests for Gaussian Mixture Models (GMMs).
2 code implementations • 17 Sep 2019 • Tudor Manole, Sivaraman Balakrishnan, Larry Wasserman
To motivate the choice of these classes, we also study minimax rates of estimating a distribution under the Sliced Wasserstein distance.
2 code implementations • 20 Aug 2019 • Collin A. Politsch, Jessi Cisewski-Kehe, Rupert A. C. Croft, Larry Wasserman
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy.
Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics Applications
no code implementations • 24 May 2018 • Yotam Hechtlinger, Barnabás Póczos, Larry Wasserman
Our construction is based on $p(x|y)$ rather than $p(y|x)$ which results in a classifier that is very cautious: it outputs the null set --- meaning "I don't know" --- when the object does not resemble the training examples.
no code implementations • 17 Dec 2017 • Sivaraman Balakrishnan, Larry Wasserman
The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson.
no code implementations • 30 Jun 2017 • Sivaraman Balakrishnan, Larry Wasserman
In contrast to existing results, we show that the minimax rate and critical testing radius in these settings depend strongly, and in a precise way, on the null distribution being tested and this motivates the study of the (local) minimax rate as a function of the null distribution.
1 code implementation • 27 Sep 2016 • Larry Wasserman
Topological Data Analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data.
Methodology
no code implementations • 2 Sep 2016 • Mauricio Sadinle, Jing Lei, Larry Wasserman
In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label.
no code implementations • 1 Jun 2016 • Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman
We present a method for finding high density, low-dimensional structures in noisy point clouds.
no code implementations • NeurIPS 2016 • Jisu Kim, Yen-Chi Chen, Sivaraman Balakrishnan, Alessandro Rinaldo, Larry Wasserman
A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters.
5 code implementations • 14 Apr 2016 • Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman
In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
no code implementations • 6 Feb 2016 • Ilmun Kim, Aaditya Ramdas, Aarti Singh, Larry Wasserman
We prove two results that hold for all classifiers in any dimensions: if its true error remains $\epsilon$-better than chance for some $\epsilon>0$ as $d, n \to \infty$, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent.
no code implementations • 23 Jan 2016 • Aaditya Ramdas, David Isenberg, Aarti Singh, Larry Wasserman
Linear independence testing is a fundamental information-theoretic and statistical problem that can be posed as follows: given $n$ points $\{(X_i, Y_i)\}^n_{i=1}$ from a $p+q$ dimensional multivariate distribution where $X_i \in \mathbb{R}^p$ and $Y_i \in\mathbb{R}^q$, determine whether $a^T X$ and $b^T Y$ are uncorrelated for every $a \in \mathbb{R}^p, b\in \mathbb{R}^q$ or not.
no code implementations • NeurIPS 2015 • Kirthevasan Kandasamy, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, James M. Robins
We propose and analyse estimators for statistical functionals of one or moredistributions under nonparametric assumptions. Our estimators are derived from the von Mises expansion andare based on the theory of influence functions, which appearin the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out techniqueenjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular informationtheoretic quantities, and via empirical evaluation, show the advantage of thisapproach over existing estimators.
no code implementations • 8 Oct 2015 • Yen-Chi Chen, Daren Wang, Alessandro Rinaldo, Larry Wasserman
Persistence diagrams are two-dimensional plots that summarize the topological features of functions and are an important part of topological data analysis.
no code implementations • 4 Aug 2015 • Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry Wasserman
We formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime.
1 code implementation • 29 Jun 2015 • Yen-Chi Chen, Christopher R. Genovese, Larry Wasserman
The Morse-Smale complex of a function $f$ decomposes the sample space into cells where $f$ is increasing or decreasing.
no code implementations • NeurIPS 2015 • Yen-Chi Chen, Christopher R. Genovese, Shirley Ho, Larry Wasserman
We introduce the concept of coverage risk as an error measure for density ridge estimation.
no code implementations • 15 May 2015 • Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman
For larger $\sigma$, the \textit{unflattening} of the regression function on convolution with uniform noise, along with its local antisymmetry around the threshold, together yield a behaviour where noise \textit{appears} to be beneficial.
no code implementations • 3 May 2015 • Martin Azizyan, Yen-Chi Chen, Aarti Singh, Larry Wasserman
We study the risk of mode-based clustering.
2 code implementations • 22 Dec 2014 • Frédéric Chazal, Brittany T. Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman
However, the empirical distance function is highly non-robust to noise and outliers.
Statistics Theory Computational Geometry Algebraic Topology Statistics Theory
no code implementations • 4 Dec 2014 • Yen-Chi Chen, Christopher R. Genovese, Ryan J. Tibshirani, Larry Wasserman
Modal regression estimates the local modes of the distribution of $Y$ given $X=x$, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods.
no code implementations • 23 Nov 2014 • Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry Wasserman
The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textit{mean-shift} alternatives).
2 code implementations • 17 Nov 2014 • Kirthevasan Kandasamy, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, James M. Robins
We propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions.
no code implementations • 30 Oct 2014 • Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman
We give a comprehensive theoretical characterization of a nonparametric estimator for the $L_2^2$ divergence between two continuous distributions.
no code implementations • 6 Aug 2014 • Mattia Ciollaro, Christopher Genovese, Jing Lei, Larry Wasserman
We introduce the functional mean-shift algorithm, an iterative algorithm for estimating the local modes of a surrogate density from functional data.
no code implementations • 29 Jun 2014 • Giuseppe Vinci, Peter Freeman, Jeffrey Newman, Larry Wasserman, Christopher Genovese
The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information.
no code implementations • 9 Jun 2014 • Larry Wasserman, Martin Azizyan, Aarti Singh
We provide explicit bounds on the error rate of the resulting clustering.
no code implementations • 9 Jun 2014 • Martin Azizyan, Aarti Singh, Larry Wasserman
We consider the problem of clustering data points in high dimensions, i. e. when the number of data points may be much smaller than the number of dimensions.
no code implementations • 9 Jun 2014 • Sashank J. Reddi, Aaditya Ramdas, Barnabás Póczos, Aarti Singh, Larry Wasserman
This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing.
no code implementations • 7 Jun 2014 • Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman
Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets.
Algebraic Topology Computational Geometry Applications
no code implementations • 6 Jun 2014 • Yen-Chi Chen, Christopher R. Genovese, Larry Wasserman
Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes.
no code implementations • 12 Feb 2014 • Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman
We consider nonparametric estimation of $L_2$, Renyi-$\alpha$ and Tsallis-$\alpha$ divergences between continuous distributions.
no code implementations • 29 Dec 2013 • Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman
We derive nonparametric confidence intervals for the eigenvalues of the Hessian at modes of a density estimate.
no code implementations • 2 Dec 2013 • Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman
Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram.
Statistics Theory Computational Geometry Algebraic Topology Statistics Theory
1 code implementation • 2 Nov 2013 • Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
Persistent homology probes topological properties from point clouds and functions.
Algebraic Topology Computational Geometry Applications
no code implementations • 26 Sep 2013 • Larry Wasserman, Mladen Kolar, Alessandro Rinaldo
In particular, we consider: cluster graphs, restricted partial correlation graphs and correlation graphs.
no code implementations • 29 Jul 2013 • Sivaraman Balakrishnan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
In this note we use a different construction based on the direct analysis of the likelihood ratio test to show that the upper bound of Niyogi, Smale and Weinberger is in fact tight, thus establishing rate optimal asymptotic minimax bounds for the problem.
no code implementations • NeurIPS 2013 • Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
In this paper we investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$.
no code implementations • NeurIPS 2013 • Martin Azizyan, Aarti Singh, Larry Wasserman
While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood.
no code implementations • 28 Mar 2013 • Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, Aarti Singh
Persistent homology is a method for probing topological properties of point clouds and functions.
no code implementations • 26 Feb 2013 • Jing Lei, Alessandro Rinaldo, Larry Wasserman
This paper applies conformal prediction techniques to compute simultaneous prediction bands and clustering trees for functional data.
no code implementations • 20 Dec 2012 • Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman
Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density.
no code implementations • NeurIPS 2012 • Han Liu, Larry Wasserman, John D. Lafferty
We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information.
no code implementations • 7 Apr 2012 • Martin Azizyan, Aarti Singh, Larry Wasserman
Semisupervised methods are techniques for using labeled data $(X_1, Y_1),\ldots,(X_n, Y_n)$ together with unlabeled data $X_{n+1},\ldots, X_N$ to make predictions.
no code implementations • NeurIPS 2010 • Han Liu, Xi Chen, Larry Wasserman, John D. Lafferty
In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph.
2 code implementations • NeurIPS 2010 • Han Liu, Kathryn Roeder, Larry Wasserman
In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs.
no code implementations • NeurIPS 2008 • Han Liu, Larry Wasserman, John D. Lafferty
We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints.
no code implementations • NeurIPS 2007 • Shuheng Zhou, Larry Wasserman, John D. Lafferty
Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data.
3 code implementations • 3 Jul 2007 • Ann B. Lee, Boaz Nadler, Larry Wasserman
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables.
Methodology