Search Results for author: Yan Shuo Tan

Found 13 papers, 6 papers with code

Error Reduction from Stacked Regressions

no code implementations • 18 Sep 2023 • Xin Chen, Jason M. Klusowski, Yan Shuo Tan

In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.

Model Selection regression

Paper
Add Code

MDI+: A Flexible Random Forest-Based Feature Importance Framework

2 code implementations • 4 Jul 2023 • Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu

We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$.

Drug Response Prediction Feature Importance +1

1,293

Paper
Code

A Mixing Time Lower Bound for a Simplified Version of BART

no code implementations • 17 Oct 2022 • Omer Ronen, Theo Saarinen, Yan Shuo Tan, James Duncan, Bin Yu

In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution.

Causal Inference regression

Paper
Add Code

Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods

2 code implementations • 2 Feb 2022 • Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu

Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice.

1,293

Paper
Code

Fast Interpretable Greedy-Tree Sums

2 code implementations • 28 Jan 2022 • Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu

In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure.

Additive models Decision Making +4

1,293

Paper
Code

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

1 code implementation • 18 Oct 2021 • Yan Shuo Tan, Abhineet Agarwal, Bin Yu

We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions.

Additive models Decision Making +2

Paper
Code

Stable discovery of interpretable subgroups via calibration in causal studies

1 code implementation • 23 Aug 2020 • Raaz Dwivedi, Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, Bin Yu

Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects.

Paper
Code

Curating a COVID-19 data repository and forecasting county-level death counts in the United States

1 code implementation • 16 May 2020 • Nick Altieri, Rebecca L. Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Chao Zhang, Bin Yu

We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead.

COVID-19 Tracking Decision Making +2

227

Paper
Code

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

no code implementations • 28 Oct 2019 • Yan Shuo Tan, Roman Vershynin

In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval.

Retrieval

Paper
Add Code

Sparse Phase Retrieval via Sparse PCA Despite Model Misspecification: A Simplified and Extended Analysis

no code implementations • 12 Dec 2017 • Yan Shuo Tan

We consider the problem of high-dimensional misspecified phase retrieval.

Retrieval

Paper
Add Code

Subspace Clustering using Ensembles of $K$-Subspaces

no code implementations • 14 Sep 2017 • John Lipor, David Hong, Yan Shuo Tan, Laura Balzano

We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the K-subspaces (KSS) algorithm via the evidence accumulation clustering framework.

Clustering

Paper
Add Code

Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees

no code implementations • 30 Jun 2017 • Yan Shuo Tan, Roman Vershynin

We consider the problem of phase retrieval, i. e. that of solving systems of quadratic equations.

Retrieval

Paper
Add Code

Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods

no code implementations • 4 Apr 2017 • Yan Shuo Tan, Roman Vershynin

The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-gaussian distribution.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.