Generative Pretraining from Pixels

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure... (read more)

PDF Abstract ICML 2020 PDF

Results from the Paper


Ranked #11 on Image Classification on STL-10 (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
BENCHMARK
Self-Supervised Image Classification ImageNet iGPT-L (32x32) Top 1 Accuracy 60.3% # 37
Self-Supervised Image Classification ImageNet iGPT-L (48x48) Top 1 Accuracy 65.2% # 28
Self-Supervised Image Classification ImageNet iGPT-XL (64x64, 3072 features) Top 1 Accuracy 68.7% # 22
Self-Supervised Image Classification ImageNet iGPT-XL (64x64, 15360 features) Top 1 Accuracy 72.0% # 16
Image Classification STL-10 AMDIM-L Percentage correct 94.2 # 20
Image Classification STL-10 iGPT-L Percentage correct 97.1 # 11

Methods used in the Paper