Search Results for author: Yan Shuo Tan

Found 13 papers, 6 papers with code

Error Reduction from Stacked Regressions

no code implementations18 Sep 2023 Xin Chen, Jason M. Klusowski, Yan Shuo Tan

In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.

Model Selection regression

MDI+: A Flexible Random Forest-Based Feature Importance Framework

2 code implementations4 Jul 2023 Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu

We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$.

Drug Response Prediction Feature Importance +1

A Mixing Time Lower Bound for a Simplified Version of BART

no code implementations17 Oct 2022 Omer Ronen, Theo Saarinen, Yan Shuo Tan, James Duncan, Bin Yu

In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution.

Causal Inference regression

Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods

2 code implementations2 Feb 2022 Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu

Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice.

Fast Interpretable Greedy-Tree Sums

2 code implementations28 Jan 2022 Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu

In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure.

Additive models Decision Making +4

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

1 code implementation18 Oct 2021 Yan Shuo Tan, Abhineet Agarwal, Bin Yu

We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions.

Additive models Decision Making +2

Stable discovery of interpretable subgroups via calibration in causal studies

1 code implementation23 Aug 2020 Raaz Dwivedi, Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, Bin Yu

Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects.

Curating a COVID-19 data repository and forecasting county-level death counts in the United States

1 code implementation16 May 2020 Nick Altieri, Rebecca L. Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Chao Zhang, Bin Yu

We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead.

COVID-19 Tracking Decision Making +2

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

no code implementations28 Oct 2019 Yan Shuo Tan, Roman Vershynin

In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval.

Retrieval

Subspace Clustering using Ensembles of $K$-Subspaces

no code implementations14 Sep 2017 John Lipor, David Hong, Yan Shuo Tan, Laura Balzano

We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the K-subspaces (KSS) algorithm via the evidence accumulation clustering framework.

Clustering

Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees

no code implementations30 Jun 2017 Yan Shuo Tan, Roman Vershynin

We consider the problem of phase retrieval, i. e. that of solving systems of quadratic equations.

Retrieval

Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods

no code implementations4 Apr 2017 Yan Shuo Tan, Roman Vershynin

The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-gaussian distribution.

Cannot find the paper you are looking for? You can Submit a new open access paper.