Optimal Robust Classification Trees

In many high-stakes domains, the data used to drive machine learning algorithms is noisy (due to e.g., the sensitive nature of the data being collected, limited resources available to validate the data, etc). This may cause a distribution shift to occur, where the distribution of the training data does not match the distribution of the testing data. In the presence of distribution shifts, any trained model can perform poorly in the testing phase. In this paper, motivated by the need for interpretability and robustness, we propose a mixed-integer optimization formulation and a tailored solution algorithm for learning optimal classification trees that are robust to adversarial perturbations in the data features. We evaluate the performance of our approach on numerous publicly available datasets, and compare the performance to a regularized, non-robust optimal tree. We show an increase of up to 14.16 percent in worst-case accuracy and increase of up to 4.72 percent in average-case accuracy across several data sets and distribution shifts from using our robust solution in comparison to the non-robust solution.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here