Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

21 Mar 2020Joshua KnightsBen HarwoodDaniel WardAnthony VanderkopOlivia Mackenzie-RossPeyman Moghadam

This paper presents TCE: Temporally Coherent Embeddings for self-supervised video representation learning. The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding space, rather than indirectly learning it through ranking or predictive proxy tasks... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Self-Supervised Action Recognition HMDB51 TCE (ResNet-18) Top-1 Accuracy 34.2 # 4
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition HMDB51 TCE (ResNet-50) Top-1 Accuracy 36.6 # 1
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition UCF101 TCE (ResNet18, Split 1) 3-fold Accuracy 68.2 # 4
Pre-Training Dataset UCF101 # 1
Self-Supervised Action Recognition UCF101 TCE (ResNet-18, Split 1) 3-fold Accuracy 68.8 # 3
Pre-Training Dataset Kinetics400 # 1
Self-Supervised Action Recognition UCF101 TCE (ResNet-50) 3-fold Accuracy 71.2 # 2
Pre-Training Dataset Kinetics400 # 1

Methods used in the Paper