Video Representation Learning by Dense Predictive Coding

10 Sep 2019Tengda HanWeidi XieAndrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for self-supervised representation learning on videos... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Self-Supervised Action Recognition HMDB51 DPC (Modified 3D ResNet-18) Top-1 Accuracy 34.5 # 3
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition HMDB51 DPC (Modified 3D Resnet-34) Top-1 Accuracy 35.7 # 2
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-18) 3-fold Accuracy 68.2 # 4
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-18, Split 1) 3-fold Accuracy 60.6 # 9
Pre-Training Dataset UCF101 # 1
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-34) 3-fold Accuracy 75.7 # 1
Pre-Training Dataset Kinetics400 # 1

Methods used in the Paper