TSM: Temporal Shift Module for Efficient Video Understanding

ICCV 2019 Ji LinChuang GanSong Han

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Video Object Detection ImageNet VID Online TSM MAP 76.3 # 5
Action Recognition In Videos Something-Something V1 TSM (RGB + Flow) Top 1 Accuracy 50.7 # 6
Action Recognition In Videos Something-Something V2 TSM (RGB + Flow) Top-1 Accuracy 66.6 # 1
Top-5 Accuracy 91.3 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet