TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Generation	Sky Time-lapse	Long-video GAN (128x128)	FVD 16	107.5	# 4
Video Generation	Sky Time-lapse	LVDM (256x256)	KVD16	3.9	# 4
Video Generation	Sky Time-lapse	LVDM (256x256)	FVD 16	95.2	# 3
Video Generation	Sky Time-lapse	MoCoGAN-HD (128x128)	KVD16	13.9	# 1
Video Generation	Sky Time-lapse	MoCoGAN-HD (128x128)	FVD 16	183.6	# 8
Video Generation	Sky Time-lapse	TATS (128x128)	KVD16	5.7	# 3
Video Generation	Sky Time-lapse	TATS (128x128)	FVD 16	132.6	# 7
Video Generation	Sky Time-lapse	Long-video GAN (256x256)	FVD 16	116.5	# 6
Video Generation	Sky Time-lapse	DIGAN (128x128)	KVD16	6.8	# 2
Video Generation	Sky Time-lapse	DIGAN (128x128)	FVD 16	114.6	# 5
Video Generation	Taichi	MoCoGAN-HD (128x128)	FVD16	144.7	# 5
Video Generation	Taichi	MoCoGAN-HD (128x128)	KVD16	25.4	# 1
Video Generation	Taichi	DIGAN (256x256)	FVD16	156.7	# 6
Video Generation	Taichi	LVDM (256x256)	FVD16	99	# 3
Video Generation	Taichi	LVDM (256x256)	KVD16	15.3	# 3
Video Generation	Taichi	DIGAN (128x128)	FVD16	128.1	# 4
Video Generation	Taichi	DIGAN (128x128)	KVD16	20.6	# 2
Video Generation	Taichi	TATS (128x128)	FVD16	94.6	# 2
Video Generation	Taichi	TATS (128x128)	KVD16	9.8	# 4
Video Generation	UCF-101	MCVD	FVD16	2460	# 36
Video Generation	UCF-101	MCVD	KVD16	148	# 6
Video Generation	UCF-101	LVDM (256x256, unconditional)	FVD16	372	# 21
Video Generation	UCF-101	LVDM (256x256, unconditional)	KVD16	27	# 1
Video Generation	UCF-101	LVDM (256x256, unconditional)	FVD16	552	# 28
Video Generation	UCF-101	LVDM (256x256, unconditional)	KVD16	42	# 3
Video Generation	UCF-101	TGAN-v2 (128x128)	FVD16	1209	# 34
Video Generation	UCF-101	VDM	FVD16	1396	# 35
Video Generation	UCF-101	VDM	KVD16	116	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/latent-video-diffusion-models-for-high/video-generation-on-taichi)](https://paperswithcode.com/sota/video-generation-on-taichi?p=latent-video-diffusion-models-for-high)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/latent-video-diffusion-models-for-high/video-generation-on-sky-time-lapse)](https://paperswithcode.com/sota/video-generation-on-sky-time-lapse?p=latent-video-diffusion-models-for-high)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/latent-video-diffusion-models-for-high/video-generation-on-ucf-101)](https://paperswithcode.com/sota/video-generation-on-ucf-101?p=latent-video-diffusion-models-for-high)`

Latent Video Diffusion Models for High-Fidelity Long Video Generation

23 Nov 2022 · Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen ·

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length. Extensive experiments on small domain datasets of different categories suggest that our framework generates more realistic and longer videos than previous strong baselines. We additionally provide an extension to large-scale text-to-video generation to demonstrate the superiority of our work. Our code and models will be made publicly available.

PDF Abstract

Code

Add Remove Mark official

yingqinghe/lvdm official

407

Tasks

Add Remove

Denoising

Image Generation

Text-to-Video Generation

Video Generation

Vocal Bursts Intensity Prediction

Datasets

UCF101

Results from the Paper

Edit

Ranked #2 on Video Generation on Taichi

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Generation	Sky Time-lapse	Long-video GAN (128x128)	FVD 16	107.5	# 4	Compare
Video Generation	Sky Time-lapse	LVDM (256x256)	KVD16	3.9	# 4	Compare
Video Generation	Sky Time-lapse	LVDM (256x256)	FVD 16	95.2	# 3	Compare
Video Generation	Sky Time-lapse	MoCoGAN-HD (128x128)	KVD16	13.9	# 1	Compare
Video Generation	Sky Time-lapse	MoCoGAN-HD (128x128)	FVD 16	183.6	# 8	Compare
Video Generation	Sky Time-lapse	TATS (128x128)	KVD16	5.7	# 3	Compare
Video Generation	Sky Time-lapse	TATS (128x128)	FVD 16	132.6	# 7	Compare
Video Generation	Sky Time-lapse	Long-video GAN (256x256)	FVD 16	116.5	# 6	Compare
Video Generation	Sky Time-lapse	DIGAN (128x128)	KVD16	6.8	# 2	Compare
Video Generation	Sky Time-lapse	DIGAN (128x128)	FVD 16	114.6	# 5	Compare
Video Generation	Taichi	MoCoGAN-HD (128x128)	FVD16	144.7	# 5	Compare
Video Generation	Taichi	MoCoGAN-HD (128x128)	KVD16	25.4	# 1	Compare
Video Generation	Taichi	DIGAN (256x256)	FVD16	156.7	# 6	Compare
Video Generation	Taichi	LVDM (256x256)	FVD16	99	# 3	Compare
Video Generation	Taichi	LVDM (256x256)	KVD16	15.3	# 3	Compare
Video Generation	Taichi	DIGAN (128x128)	FVD16	128.1	# 4	Compare
Video Generation	Taichi	DIGAN (128x128)	KVD16	20.6	# 2	Compare
Video Generation	Taichi	TATS (128x128)	FVD16	94.6	# 2	Compare
Video Generation	Taichi	TATS (128x128)	KVD16	9.8	# 4	Compare
Video Generation	UCF-101	MCVD	FVD16	2460	# 36	Compare
Video Generation	UCF-101	MCVD	KVD16	148	# 6	Compare
Video Generation	UCF-101	LVDM (256x256, unconditional)	FVD16	372	# 21	Compare
			KVD16	27	# 1	Compare
			FVD16	552	# 28	Compare
			KVD16	42	# 3	Compare
Video Generation	UCF-101	TGAN-v2 (128x128)	FVD16	1209	# 34	Compare
Video Generation	UCF-101	VDM	FVD16	1396	# 35	Compare
Video Generation	UCF-101	VDM	KVD16	116	# 5	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove