Feature Denoising for Improving Adversarial Robustness

Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Adversarial Defense CAAD 2018 Feature Denoising Accuracy 50.6% # 1
Adversarial Defense ImageNet Feature Denoising Accuracy 49.5% # 3
Adversarial Defense ImageNet (targeted PGD, max perturbation=16) ResNeXt-101 DenoiseAll Accuracy 40.4 # 2
Adversarial Defense ImageNet (targeted PGD, max perturbation=16) ResNet-152 Accuracy 39.0 # 3
Adversarial Defense ImageNet (targeted PGD, max perturbation=16) ResNet-152 Denoise Accuracy 42.8 # 1

Methods


No methods listed for this paper. Add relevant methods here