Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a large-scale study of many popular publicly-available representation learning algorithms. We carefully control confounders such as architecture and tuning budget. We address questions like: How effective are ImageNet representations beyond standard natural datasets? How do representations trained via generative and discriminative models compare? To what extent can self-supervision replace labels? And, how close are we to general visual representations?

PDF Abstract

Results from the Paper


Ranked #10 on Image Classification on VTAB-1k (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification VTAB-1k S4L-Exemplar-ResNet50-LargeHyperSweep Top-1 Accuracy 72.7 # 7
Image Classification VTAB-1k WAE-GAN Top-1 Accuracy 32.0 # 33
Image Classification VTAB-1k Conditional-BigGAN Top-1 Accuracy 35.3 # 32
Image Classification VTAB-1k WAE-MMD Top-1 Accuracy 37.3 # 31
Image Classification VTAB-1k VAE Top-1 Accuracy 37.5 # 30
Image Classification VTAB-1k ResNet50 Top-1 Accuracy 42.1 # 29
Image Classification VTAB-1k Unconditional-BigGAN-ResNet50 Top-1 Accuracy 44.0 # 28
Image Classification VTAB-1k SelfSup-RelativePatchLoc-ResNet50 Top-1 Accuracy 50.8 # 27
Image Classification VTAB-1k SelfSup-Jigsaw-ResNet50 Top-1 Accuracy 51.1 # 26
Image Classification VTAB-1k SelfSup-Exemplar-ResNet50 Top-1 Accuracy 57.5 # 24
Image Classification VTAB-1k BigBiGAN-ResNet50 Top-1 Accuracy 59.1 # 22
Image Classification VTAB-1k ResNet50-LargeHyperSweep Top-1 Accuracy 59.2 # 21
Image Classification VTAB-1k SelfSup-Rotation-ResNet50 Top-1 Accuracy 59.5 # 20
Image Classification VTAB-1k ImageNet-10%-ResNet50 Top-1 Accuracy 61.6 # 19
Image Classification VTAB-1k S4L-10%-Exemplar-ResNet50 Top-1 Accuracy 63.9 # 18
Image Classification VTAB-1k S4L-10%-Rotation-ResNet50 Top-1 Accuracy 64.8 # 17
Image Classification VTAB-1k ImageNet-ResNet50-LargeHyperSweep Top-1 Accuracy 71.2 # 10
Image Classification VTAB-1k S4L-Rotation-ResNet50-LargeHyperSweep Top-1 Accuracy 71.5 # 9
Image Classification VTAB-1k WAE-UKL Top-1 Accuracy 31.0 # 34
Image Classification VTAB-1k ImageNet-ResNet50 Top-1 Accuracy 65.6 # 16
Image Classification VTAB-1k S4L-Exemplar-ResNet50 Top-1 Accuracy 67.0 # 14
Image Classification VTAB-1k S4L-Rotation-ResNet50 Top-1 Accuracy 67.5 # 13

Methods