no code implementations • 18 Sep 2023 • Xin Chen, Jason M. Klusowski, Yan Shuo Tan
In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.
2 code implementations • 4 Jul 2023 • Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu
We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$.
no code implementations • 17 Oct 2022 • Omer Ronen, Theo Saarinen, Yan Shuo Tan, James Duncan, Bin Yu
In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution.
2 code implementations • 2 Feb 2022 • Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu
Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice.
2 code implementations • 28 Jan 2022 • Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu
In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure.
1 code implementation • 18 Oct 2021 • Yan Shuo Tan, Abhineet Agarwal, Bin Yu
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions.
1 code implementation • 23 Aug 2020 • Raaz Dwivedi, Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, Bin Yu
Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects.
1 code implementation • 16 May 2020 • Nick Altieri, Rebecca L. Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Chao Zhang, Bin Yu
We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead.
no code implementations • 28 Oct 2019 • Yan Shuo Tan, Roman Vershynin
In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval.
no code implementations • 12 Dec 2017 • Yan Shuo Tan
We consider the problem of high-dimensional misspecified phase retrieval.
no code implementations • 14 Sep 2017 • John Lipor, David Hong, Yan Shuo Tan, Laura Balzano
We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the K-subspaces (KSS) algorithm via the evidence accumulation clustering framework.
no code implementations • 30 Jun 2017 • Yan Shuo Tan, Roman Vershynin
We consider the problem of phase retrieval, i. e. that of solving systems of quadratic equations.
no code implementations • 4 Apr 2017 • Yan Shuo Tan, Roman Vershynin
The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-gaussian distribution.