TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Inpainting	DAVIS	STTN	PSNR	30.67	# 5
Video Inpainting	DAVIS	STTN	SSIM	0.9560	# 4
Video Inpainting	DAVIS	STTN	VFID	0.149	# 4
Video Inpainting	DAVIS	STTN	Ewarp	0.1449	# 3
Seeing Beyond the Visible	KITTI360-EX	STTN	Average PSNR	18.73	# 5
Video Inpainting	YouTube-VOS 2018	STTN	PSNR	32.34	# 5
Video Inpainting	YouTube-VOS 2018	STTN	SSIM	0.9655	# 5
Video Inpainting	YouTube-VOS 2018	STTN	VFID	0.053	# 4
Video Inpainting	YouTube-VOS 2018	STTN	Ewarp	0.0907	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-joint-spatial-temporal/video-inpainting-on-davis)](https://paperswithcode.com/sota/video-inpainting-on-davis?p=learning-joint-spatial-temporal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-joint-spatial-temporal/seeing-beyond-the-visible-on-kitti360-ex)](https://paperswithcode.com/sota/seeing-beyond-the-visible-on-kitti360-ex?p=learning-joint-spatial-temporal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-joint-spatial-temporal/video-inpainting-on-youtube-vos)](https://paperswithcode.com/sota/video-inpainting-on-youtube-vos?p=learning-joint-spatial-temporal)`

Learning Joint Spatial-Temporal Transformations for Video Inpainting

ECCV 2020 · Yanhong Zeng, Jianlong Fu, Hongyang Chao ·

High-quality video inpainting that completes missing regions in video frames is a promising yet challenging task. State-of-the-art approaches adopt attention models to complete a frame by searching missing contents from reference frames, and further complete whole videos frame by frame. However, these approaches can suffer from inconsistent attention results along spatial and temporal dimensions, which often leads to blurriness and temporal artifacts in videos. In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. Specifically, we simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss. To show the superiority of the proposed model, we conduct both quantitative and qualitative evaluations by using standard stationary masks and more realistic moving object masks. Demo videos are available at https://github.com/researchmm/STTN.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Code

Add Remove Mark official

researchmm/STTN official

434

Feynman1999/MgeEditing

Tasks

Add Remove

Seeing Beyond the Visible

Video Inpainting

Datasets

DAVIS

YouTube-VOS 2018

KITTI360-EX

Results from the Paper

Edit

Ranked #5 on Seeing Beyond the Visible on KITTI360-EX

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Inpainting	DAVIS	STTN	PSNR	30.67	# 5	Compare
			SSIM	0.9560	# 4	Compare
			VFID	0.149	# 4	Compare
			Ewarp	0.1449	# 3	Compare
Seeing Beyond the Visible	KITTI360-EX	STTN	Average PSNR	18.73	# 5	Compare
Video Inpainting	YouTube-VOS 2018	STTN	PSNR	32.34	# 5	Compare
			SSIM	0.9655	# 5	Compare
			VFID	0.053	# 4	Compare
			Ewarp	0.0907	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove