Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. We study the performance of the resulting models in various transfer-learning settings including zero-shot transfer. We also compare our models with those obtained via large-scale self-supervised learning. We find our weakly-supervised models to be very competitive across all settings, and find they substantially outperform their self-supervised counterparts. We also include an investigation into whether our models learned potentially troubling associations or stereotypes. Overall, our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems. Our models, Supervised Weakly through hashtAGs (SWAG), are available publicly.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


 Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Fine-Grained Image Classification CUB-200-2011 SWAG (ViT H/14) Accuracy 91.7 # 4
Image Classification ImageNet SWAG (ViT H/14) Top 1 Accuracy 88.6% # 44
Number of params 633.5M # 943
GFLOPs 1018.8 # 488
Image Classification ImageNet ReaL SWAG (RegNetY 128GF) Accuracy 90.7% # 13
Image Classification ImageNet V2 SWAG (ViT H/14) Top 1 Accuracy 81.1 # 9
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-B/16, linear probing, IG-3.6B) IN-W Gap -7.7 # 1
Carton Gap +18 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (RegNet-32gf, fine-tuning, IG-3.6B) IN-W Gap -4.5 # 1
Carton Gap +30 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (RegNet-32gf, linear probing, IG-3.6B) IN-W Gap -6.5 # 1
Carton Gap +22 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-H/14, fine-tuning, IG-3.6B) IN-W Gap -3.1 # 1
Carton Gap +18 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-H/14, linear probing, IG-3.6B) IN-W Gap -4.9 # 1
Carton Gap +8 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-L/16, fine-tuning, IG-3.6B) IN-W Gap -3.2 # 1
Carton Gap +20 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-L/16, linear probing, IG-3.6B) IN-W Gap -5.7 # 1
Carton Gap +6 # 1
Out-of-Distribution Generalization ImageNet-W SWAG (ViT-B/16, fine-tuning, IG-3.6B) IN-W Gap -5.4 # 1
Carton Gap +24 # 1
Image Classification iNaturalist 2018 SWAG (ViT H/14) Top-1 Accuracy 86.0% # 7
Image Classification ObjectNet ViT H/14 (Platt) Top-1 Accuracy 60 # 20
Image Classification ObjectNet RegNetY 128GF (Platt) Top-1 Accuracy 64.3 # 17
Image Classification ObjectNet ViT L/16 (Platt) Top-1 Accuracy 57.3 # 22
Image Classification ObjectNet SWAG (ViT H/14) Top-1 Accuracy 69.5 # 15
Image Classification ObjectNet ViT B/16 Top-1 Accuracy 48.9 # 29
Image Classification Places365-Standard SWAG (ViT H/14) Top 1 Accuracy 60.7 # 1

Methods


No methods listed for this paper. Add relevant methods here