THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION
It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically. However, the theoretical understanding of the difference between the solution of adversarial training and that of standard training is limited. In this paper, we characterize the solution of adversarial training for linear classification problem for a full range of adversarial radius ". Specifically, we show that if the data themselves are ”-strongly linearly-separable”, adversarial training with radius smaller than " converges to the hard margin solution of SVM with a faster rate than standard training. If the data themselves are not ”-strongly linearly-separable”, we show that adversarial training with radius " is stable to outliers while standard training is not. Moreover, we prove that the classifier returned by adversarial training with a large radius " has low confidence in each data point. Experiments corroborate our theoretical finding well.
PDF Abstract