Search Results for author: Jason M. Klusowski

Found 25 papers, 0 papers with code

Challenges in Variable Importance Ranking Under Correlation

no code implementations5 Feb 2024 Annie Liang, Thomas Jemielita, Andy Liaw, Vladimir Svetnik, Lingkang Huang, Richard Baumgartner, Jason M. Klusowski

Recently, several adjustments to marginal permutation utilizing feature knockoffs were proposed to address this issue, such as the variable importance measure known as conditional predictive impact (CPI).

Feature Correlation Interpretable Machine Learning

Stochastic Gradient Descent for Additive Nonparametric Regression

no code implementations1 Jan 2024 Xin Chen, Jason M. Klusowski

This paper introduces an iterative algorithm for training additive models that enjoys favorable memory storage and computational requirements.

Additive models regression

Inference with Mondrian Random Forests

no code implementations15 Oct 2023 Matias D. Cattaneo, Jason M. Klusowski, William G. Underwood

Random forests are popular methods for classification and regression, and many different variants have been proposed in recent years.

regression valid

Robust Transfer Learning with Unreliable Source Data

no code implementations6 Oct 2023 Jianqing Fan, Cheng Gao, Jason M. Klusowski

This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution.

regression Transfer Learning

Error Reduction from Stacked Regressions

no code implementations18 Sep 2023 Xin Chen, Jason M. Klusowski, Yan Shuo Tan

In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.

Model Selection regression

On the Implicit Bias of Adam

no code implementations31 Aug 2023 Matias D. Cattaneo, Jason M. Klusowski, Boris Shigida

In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory.

Sharp Convergence Rates for Matching Pursuit

no code implementations15 Jul 2023 Jason M. Klusowski, Jonathan W. Siegel

We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary.

Large Scale Prediction with Decision Trees

no code implementations28 Apr 2021 Jason M. Klusowski, Peter M. Tian

This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4. 5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints.

regression Vocal Bursts Intensity Prediction

Nonparametric Variable Screening with Optimal Decision Stumps

no code implementations5 Nov 2020 Jason M. Klusowski, Peter M. Tian

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model.

Model Selection Variable Selection

Good Classifiers are Abundant in the Interpolating Regime

no code implementations22 Jun 2020 Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney

Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes.

Learning Theory

Sparse learning with CART

no code implementations NeurIPS 2020 Jason M. Klusowski

In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem.

regression Sparse Learning

Global Capacity Measures for Deep ReLU Networks via Path Sampling

no code implementations22 Oct 2019 Ryan Theisen, Jason M. Klusowski, Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure.

Generalization Bounds Multi-class Classification

Analyzing CART

no code implementations24 Jun 2019 Jason M. Klusowski

For binary classification and regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable.

Binary Classification regression

Complexity, Statistical Risk, and Metric Entropy of Deep Nets Using Total Path Variation

no code implementations2 Feb 2019 Andrew R. Barron, Jason M. Klusowski

For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly $1$, and the input layer variables are multiplied by a value $V$ coinciding with the total variation of the path weights.

Approximation and Estimation for High-Dimensional Deep Learning Networks

no code implementations10 Sep 2018 Andrew R. Barron, Jason M. Klusowski

It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations.

Vocal Bursts Intensity Prediction

Sharp Analysis of a Simple Model for Random Forests

no code implementations7 May 2018 Jason M. Klusowski

Random forests have become an important tool for improving accuracy in regression and classification problems since their inception by Leo Breiman in 2001.

regression

Counting Motifs with Graph Sampling

no code implementations21 Feb 2018 Jason M. Klusowski, Yihong Wu

Applied researchers often construct a network from a random sample of nodes in order to infer properties of the parent network.

Graph Sampling

Estimating the Number of Connected Components in a Graph via Subgraph Sampling

no code implementations12 Jan 2018 Jason M. Klusowski, Yihong Wu

Learning properties of large graphs from samples has been an important problem in statistical network analysis since the early work of Goodman \cite{Goodman1949} and Frank \cite{Frank1978}.

Finite-sample risk bounds for maximum likelihood estimation with arbitrary penalties

no code implementations29 Dec 2017 W. D. Brinda, Jason M. Klusowski

The MDL two-part coding $ \textit{index of resolvability} $ provides a finite-sample upper bound on the statistical risk of penalized likelihood estimators over countable models.

Estimating the Coefficients of a Mixture of Two Linear Regressions by Expectation Maximization

no code implementations26 Apr 2017 Jason M. Klusowski, Dana Yang, W. D. Brinda

We also show that the population EM operator for mixtures of two regressions is anti-contractive from the target parameter vector if the cosine angle between the input vector and the target parameter vector is too small, thereby establishing the necessity of our conic condition.

Minimax Lower Bounds for Ridge Combinations Including Neural Nets

no code implementations9 Feb 2017 Jason M. Klusowski, Andrew R. Barron

Estimation of functions of $ d $ variables is considered using ridge combinations of the form $ \textstyle\sum_{k=1}^m c_{1, k} \phi(\textstyle\sum_{j=1}^d c_{0, j, k}x_j-b_k) $ where the activation function $ \phi $ is a function with bounded value and derivative.

Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM

no code implementations7 Aug 2016 Jason M. Klusowski, W. D. Brinda

In that method, the basin of attraction for valid initialization is required to be a ball around the truth.

LEMMA valid

Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $ \ell^1 $ and $ \ell^0 $ Controls

no code implementations26 Jul 2016 Jason M. Klusowski, Andrew R. Barron

We establish $ L^{\infty} $ and $ L^2 $ error bounds for functions of many variables that are approximated by linear combinations of ReLU (rectified linear unit) and squared ReLU ridge functions with $ \ell^1 $ and $ \ell^0 $ controls on their inner and outer parameters.

Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

no code implementations5 Jul 2016 Jason M. Klusowski, Andrew R. Barron

On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n}\right)^{2/5} $.

Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.