A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift

Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computational costs. In this study, we propose a simple yet effective framework for video restoration. Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique that can implicitly capture inter-frame correspondences for multi-frame aggregation. By introducing grouped spatial shift, we attain expansive effective receptive fields. Combined with basic 2D convolution, this simple framework can effectively aggregate inter-frame information. Extensive experiments demonstrate that our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost, on both video deblurring and video denoising tasks. These results indicate the potential for our approach to significantly reduce computational overhead while maintaining high-quality results. Code is avaliable at https://github.com/dasongli1/Shift-Net.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Datasets


Results from the Paper


 Ranked #1 on Deblurring on GoPro (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Deblurring DVD GShift-Net PSNR 34.69 # 2
SSIM 0.969 # 2
Deblurring GoPro GShift-Net PSNR 35.88 # 1
SSIM 0.979 # 1

Methods


No methods listed for this paper. Add relevant methods here