Learning and Using the Arrow of Time

We seek to understand the arrow of time in videos -- what makes videos look like they are playing forwards or backwards? Can we visualize the cues? Can the arrow of time be a supervisory signal useful for activity analysis? To this end, we build three large-scale video datasets and apply a learning-based approach to these tasks. To learn the arrow of time efficiently and reliably, we design a ConvNet suitable for extended temporal footprints and for class activation visualization, and study the effect of artificial cues, such as cinematographic conventions, on learning. Our trained model achieves state-of-the-art performance on large-scale real-world video datasets. Through cluster analysis and localization of important regions for the prediction, we examine learned visual cues that are consistent among many samples and show when and where they occur. Lastly, we use the trained ConvNet for two applications: self-supervision for action recognition, and video forensics -- determining whether Hollywood film clips have been deliberately reversed in time, often used as special effects.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Self-Supervised Action Recognition UCF101 Arrow of Time (AlexNet) 3-fold Accuracy 55.3 # 50
Pre-Training Dataset UCF101 # 1
Frozen false # 1

Methods


No methods listed for this paper. Add relevant methods here