TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Generation	FFHQ 256 x 256	Unleashing Transformers	FID	6.11	# 20
Image Generation	FFHQ 256 x 256	Unleashing Transformers (DINOv2)	FD	393.45	# 5
Image Generation	FFHQ 256 x 256	Unleashing Transformers (DINOv2)	Precision	0.76	# 6
Image Generation	FFHQ 256 x 256	Unleashing Transformers (DINOv2)	Recall	0.24	# 6
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers (DINOv2)	FD	440.04	# 6
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers (DINOv2)	Precision	0.78	# 8
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers (DINOv2)	Recall	0.41	# 4
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers	FID	3.64	# 7
Image Generation	LSUN Churches 256 x 256	Unleashing Transformers	FID	4.07	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unleashing-transformers-parallel-token/image-generation-on-lsun-bedroom-256-x-256)](https://paperswithcode.com/sota/image-generation-on-lsun-bedroom-256-x-256?p=unleashing-transformers-parallel-token)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unleashing-transformers-parallel-token/image-generation-on-ffhq-256-x-256)](https://paperswithcode.com/sota/image-generation-on-ffhq-256-x-256?p=unleashing-transformers-parallel-token)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unleashing-transformers-parallel-token/image-generation-on-lsun-churches-256-x-256)](https://paperswithcode.com/sota/image-generation-on-lsun-churches-256-x-256?p=unleashing-transformers-parallel-token)`

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

24 Nov 2021 · Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, Chris G. Willcocks ·

Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements.

PDF Abstract

Code

Add Remove Mark official

samb-t/unleashing-transformers official

173

samb-t/x2ct-vqvae

Arktis2022/mini-vq-discrete-absorbi…

Tasks

Add Remove

Image Generation

Datasets

FFHQ

LSUN

Results from the Paper

Add Remove

Ranked #4 on Image Generation on LSUN Bedroom 256 x 256 (Recall metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Generation	FFHQ 256 x 256	Unleashing Transformers	FID	6.11	# 20	Compare
Image Generation	FFHQ 256 x 256	Unleashing Transformers (DINOv2)	FD	393.45	# 5	Compare
			Precision	0.76	# 6	Compare
			Recall	0.24	# 6	Compare
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers (DINOv2)	FD	440.04	# 6	Compare
			Precision	0.78	# 8	Compare
			Recall	0.41	# 4	Compare
Image Generation	LSUN Bedroom 256 x 256	Unleashing Transformers	FID	3.64	# 7	Compare
Image Generation	LSUN Churches 256 x 256	Unleashing Transformers	FID	4.07	# 13	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Diffusion • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove