TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Generation	UCF-101	Lumiere (Zero-shot. 1024x1024, text-conditional)	Inception Score	37.54	# 18
Video Generation	UCF-101	Lumiere (Zero-shot. 1024x1024, text-conditional)	FVD16	332.49	# 16
Text-to-Video Generation	UCF-101	Lumiere (Zero-shot, 1024x1024)	FVD16	332.49	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lumiere-a-space-time-diffusion-model-for/text-to-video-generation-on-ucf-101)](https://paperswithcode.com/sota/text-to-video-generation-on-ucf-101?p=lumiere-a-space-time-diffusion-model-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lumiere-a-space-time-diffusion-model-for/video-generation-on-ucf-101)](https://paperswithcode.com/sota/video-generation-on-ucf-101?p=lumiere-a-space-time-diffusion-model-for)`

Lumiere: A Space-Time Diffusion Model for Video Generation

23 Jan 2024 · Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri ·

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Super-Resolution

Text-to-Video Generation

Video Editing

Video Generation

Video Inpainting

Datasets

UCF101

Results from the Paper

Add Remove

Ranked #6 on Text-to-Video Generation on UCF-101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Generation	UCF-101	Lumiere (Zero-shot. 1024x1024, text-conditional)	Inception Score	37.54	# 18	Compare
Video Generation	UCF-101	Lumiere (Zero-shot. 1024x1024, text-conditional)	FVD16	332.49	# 16	Compare
Text-to-Video Generation	UCF-101	Lumiere (Zero-shot, 1024x1024)	FVD16	332.49	# 6	Compare

Methods

Add Remove

Concatenated Skip Connection • Convolution • Diffusion • Max Pooling • ReLU • U-Net

Edit Social Preview

Lumiere: A Space-Time Diffusion Model for Video Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove