TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	Diving-48	PSB	Accuracy	86	# 9
Action Classification	Kinetics-400	TPS	Acc@1	82.5	# 69
Action Recognition	Something-Something V1	TPS	Top 1 Accuracy	58.3	# 10
Action Recognition	Something-Something V2	TPS	Top-1 Accuracy	69.8	# 39

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spatiotemporal-self-attention-modeling-with/action-recognition-on-diving-48)](https://paperswithcode.com/sota/action-recognition-on-diving-48?p=spatiotemporal-self-attention-modeling-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spatiotemporal-self-attention-modeling-with/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=spatiotemporal-self-attention-modeling-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spatiotemporal-self-attention-modeling-with/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=spatiotemporal-self-attention-modeling-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spatiotemporal-self-attention-modeling-with/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=spatiotemporal-self-attention-modeling-with)`

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

27 Jul 2022 · Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang ·

Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformers. In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. TPS shifts part of patches with a specific mosaic pattern in the temporal dimension, thus converting a vanilla spatial self-attention operation to a spatiotemporal one with little additional cost. As a result, we can compute 3D self-attention using nearly the same computation and memory cost as 2D self-attention. TPS is a plug-and-play module and can be inserted into existing 2D transformer models to enhance spatiotemporal feature learning. The proposed method achieves competitive performance with state-of-the-arts on Something-something V1 & V2, Diving-48, and Kinetics400 while being much more efficient on computation and memory cost. The source code of TPS can be found at https://github.com/MartinXM/TPS.

PDF Abstract

Code

Add Remove Mark official

martinxm/tps official

Tasks

Add Remove

Action Classification

Action Recognition

Datasets

Kinetics

Kinetics 400

Something-Something V2

Something-Something V1

Results from the Paper

Add Remove

Ranked #9 on Action Recognition on Diving-48

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	Diving-48	PSB	Accuracy	86	# 9	Compare
Action Classification	Kinetics-400	TPS	Acc@1	82.5	# 69	Compare
Action Recognition	Something-Something V1	TPS	Top 1 Accuracy	58.3	# 10	Compare
Action Recognition	Something-Something V2	TPS	Top-1 Accuracy	69.8	# 39	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove