AE-SMOTE: A Multi-Modal Minority Oversampling Framework
Real-world binary classification tasks are in many cases unbalanced i.e. the minority class is much smaller than the majority class. This skewness is challenging for machine learning algorithms as they tend to focus on the majority and greatly misclassify the minority. Oversampling the minority using \emph{SMOTE} before training the model is a popular method to address this challenge. Inspired by \emph{SMOTE}, we propose \emph{AE-SMOTE}, which by using an autoencoder, (1) maps the features to a dense continuous latent space, (2) applies oversampling by interpolation in the latent space, and (3) maps the synthetic samples back to the original feature space. While \emph{SMOTE} supports discrete (categorical) features, almost all variants and extensions of \emph{SMOTE} do not. Wrapping any one of these \emph{SMOTE} variants with an autoencoder will enable it to support multi-modal datasets that include discrete features. We have empirically shown the effectiveness of the proposed approach on 35 publicly available datasets.
PDF Abstract