TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Density Estimation	CIFAR-10	Image Transformer	NLL (bits/dim)	2.90	# 3
Image Generation	CIFAR-10	Image Transformer	bits/dimension	2.89	# 20
Density Estimation	ImageNet 32x32	Image Transformer	NLL (bits/dim)	3.77	# 3
Image Generation	ImageNet 32x32	Image Transformer	bpd	3.77	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/image-transformer/density-estimation-on-cifar-10)](https://paperswithcode.com/sota/density-estimation-on-cifar-10?p=image-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/image-transformer/density-estimation-on-imagenet-32x32-1)](https://paperswithcode.com/sota/density-estimation-on-imagenet-32x32-1?p=image-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/image-transformer/image-generation-on-imagenet-32x32)](https://paperswithcode.com/sota/image-generation-on-imagenet-32x32?p=image-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/image-transformer/image-generation-on-cifar-10)](https://paperswithcode.com/sota/image-generation-on-cifar-10?p=image-transformer)`

Image Transformer

15 Feb 2018 · Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran ·

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Density Estimation

Image Generation

Image Super-Resolution

Super-Resolution

Datasets

CIFAR-10

Results from the Paper

Edit

Ranked #3 on Density Estimation on CIFAR-10

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Density Estimation	CIFAR-10	Image Transformer	NLL (bits/dim)	2.90	# 3	Compare
Image Generation	CIFAR-10	Image Transformer	bits/dimension	2.89	# 20	Compare
Density Estimation	ImageNet 32x32	Image Transformer	NLL (bits/dim)	3.77	# 3	Compare
Image Generation	ImageNet 32x32	Image Transformer	bpd	3.77	# 7	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Image Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove