TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	Kinetics-400	TDN-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@1	79.4	# 103
Action Classification	Kinetics-400	TDN-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@5	94.4	# 70
Action Recognition	Something-Something V1	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top 1 Accuracy	56.8	# 17
Action Recognition	Something-Something V1	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top 5 Accuracy	84.1	# 10
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	68.2	# 51
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-5 Accuracy	91.6	# 34
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	GFLOPs	198x1	# 6
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, three crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	69.6	# 42
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, three crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-5 Accuracy	92.2	# 29
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, three crop, 8+16 ensemble, ImageNet pretrained, RGB only)	GFLOPs	198x3	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tdn-temporal-difference-networks-for/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=tdn-temporal-difference-networks-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tdn-temporal-difference-networks-for/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=tdn-temporal-difference-networks-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tdn-temporal-difference-networks-for/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=tdn-temporal-difference-networks-for)`

TDN: Temporal Difference Networks for Efficient Action Recognition

CVPR 2021 · LiMin Wang, Zhan Tong, Bin Ji, Gangshan Wu ·

Temporal modeling still remains challenging for action recognition in videos. To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition. The core of our TDN is to devise an efficient temporal module (TDM) by explicitly leveraging a temporal difference operator, and systematically assess its effect on short-term and long-term motion modeling. To fully capture temporal information over the entire video, our TDN is established with a two-level difference modeling paradigm. Specifically, for local motion modeling, temporal difference over consecutive frames is used to supply 2D CNNs with finer motion pattern, while for global motion modeling, temporal difference across segments is incorporated to capture long-range structure for motion feature excitation. TDN provides a simple and principled temporal modeling framework and could be instantiated with the existing CNNs at a small extra computational cost. Our TDN presents a new state of the art on the Something-Something V1 & V2 datasets and is on par with the best performance on the Kinetics-400 dataset. In addition, we conduct in-depth ablation studies and plot the visualization results of our TDN, hopefully providing insightful analysis on temporal difference modeling. We release the code at https://github.com/MCG-NJU/TDN.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Code

Add Remove Mark official

MCG-NJU/TDN official

362

Tasks

Add Remove

Action Classification

Action Recognition

Action Recognition In Videos

Datasets

Kinetics

Kinetics 400

Something-Something V2

Something-Something V1

Results from the Paper

Edit

Ranked #17 on Action Recognition on Something-Something V1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	Kinetics-400	TDN-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@1	79.4	# 103	Compare
Action Classification	Kinetics-400	TDN-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@5	94.4	# 70	Compare
Action Recognition	Something-Something V1	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top 1 Accuracy	56.8	# 17	Compare
Action Recognition	Something-Something V1		Top 5 Accuracy	84.1	# 10	Compare
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	68.2	# 51	Compare
			Top-5 Accuracy	91.6	# 34	Compare
			GFLOPs	198x1	# 6	Compare
Action Recognition	Something-Something V2	TDN ResNet101 (one clip, three crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	69.6	# 42	Compare
			Top-5 Accuracy	92.2	# 29	Compare
			GFLOPs	198x3	# 6	Compare

Methods

Add Remove

TDN

Edit Social Preview

TDN: Temporal Difference Networks for Efficient Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove