Improving robustness against common corruptions by covariate shift adaptation
Today's state-of-the-art machine vision models are vulnerable to image corruptions like blurring or compression artefacts, limiting their performance in many real-world applications. We here argue that popular benchmarks to measure model robustness against common corruptions (like ImageNet-C) underestimate model robustness in many (but not all) application scenarios. The key insight is that in many scenarios, multiple unlabeled examples of the corruptions are available and can be used for unsupervised online adaptation. Replacing the activation statistics estimated by batch normalization on the training set with the statistics of the corrupted images consistently improves the robustness across 25 different popular computer vision models. Using the corrected statistics, ResNet-50 reaches 62.2% mCE on ImageNet-C compared to 76.7% without adaptation. With the more robust DeepAugment+AugMix model, we improve the state of the art achieved by a ResNet50 model up to date from 53.6% mCE to 45.4% mCE. Even adapting to a single sample improves robustness for the ResNet-50 and AugMix models, and 32 samples are sufficient to improve the current state of the art for a ResNet-50 architecture. We argue that results with adapted statistics should be included whenever reporting scores in corruption benchmarks and other out-of-distribution generalization settings.
PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 AbstractCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Unsupervised Domain Adaptation | ImageNet-C | ResNeXt101+DeepAug+AugMix, BatchNorm Adaptation, 8 samples | mean Corruption Error (mCE) | 40.7 | # 6 | ||
Unsupervised Domain Adaptation | ImageNet-C | ResNet50 (baseline), BatchNorm Adaptation, full adaptation | mean Corruption Error (mCE) | 62.2 | # 15 | ||
Unsupervised Domain Adaptation | ImageNet-C | ResNet50 (baseline), BatchNorm Adaptation, 8 samples | mean Corruption Error (mCE) | 65.0 | # 16 | ||
Unsupervised Domain Adaptation | ImageNet-C | ResNeXt101+DeepAug+AugMix, BatchNorm Adaptation, full adaptation | mean Corruption Error (mCE) | 38.0 | # 5 | ||
Unsupervised Domain Adaptation | ImageNet-C | ResNet50+DeepAug+AugMix, BatchNorm Adaptation, 8 samples | mean Corruption Error (mCE) | 48.4 | # 12 | ||
Unsupervised Domain Adaptation | ImageNet-C | ResNet50+DeepAug+AugMix, BatchNorm Adaptation, full adaptation | mean Corruption Error (mCE) | 45.4 | # 11 | ||
Unsupervised Domain Adaptation | ImageNet-R | ResNet50, BatchNorm adaptation | Top 1 Error | 59.9 | # 8 | ||
Unsupervised Domain Adaptation | ImageNet-R | ResNeXt101+DeepAug+AugMix, BatchNorm Adaptation, | Top 1 Error | 44.0 | # 4 | ||
Unsupervised Domain Adaptation | ImageNet-R | ResNet50+DeepAug+Augmix, BatchNorm adaptation | Top 1 Error | 48.9 | # 5 | ||
Image Classification | ObjectNet | ResNet-50 + GroupNorm | Top-5 Accuracy | 50.2 | # 21 | ||
Top-1 Accuracy | 29.2 | # 70 | |||||
Image Classification | ObjectNet | ResNet-50 + FixUp | Top-5 Accuracy | 48.6 | # 25 | ||
Top-1 Accuracy | 28.5 | # 74 | |||||
Image Classification | ObjectNet | ResNet-50 + RoHL | Top-1 Accuracy | 29.2 | # 70 |