Can Perceptual Guidance Lead to Semantically Explainable Adversarial Perturbations?

24 Jun 2021 · P Charantej Reddy, Aditya Siripuram, Sumohana S. Channappayya ·

It is well known that carefully crafted imperceptible perturbations can cause state-of-the-art deep learning classification models to misclassify. Understanding and analyzing these adversarial perturbations play a crucial role in the design of robust convolutional neural networks. However, their mechanics are not well understood. In this work, we attempt to understand the mechanics by systematically answering the following question: do imperceptible adversarial perturbations focus on changing the regions of the image that are important for classification? In other words, are imperceptible adversarial perturbations semantically explainable? Most current methods use $l_p$ distance to generate and characterize the imperceptibility of the adversarial perturbations. However, since $l_p$ distances only measure the pixel to pixel distances and do not consider the structure in the image, these methods do not provide a satisfactory answer to the above question. To address this issue, we propose a novel framework for generating adversarial perturbations by explicitly incorporating a "perceptual quality ball" constraint in our formulation. Specifically, we pose the adversarial example generation problem as a tractable convex optimization problem, with constraints taken from a mathematically amenable variant of the popular SSIM index. We use the MobileNetV2 network trained on the ImageNet dataset for our experiments. By comparing the SSIM maps generated by our method with class activation maps, we show that the perceptually guided perturbations introduce changes specifically in the regions that contribute to classification decisions i.e., these perturbations are semantically explainable.

PDF Abstract