TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Skeleton Based Action Recognition	NTU RGB+D	SkateFormer	Accuracy (CV)	97.8	# 2
Skeleton Based Action Recognition	NTU RGB+D	SkateFormer	Accuracy (CS)	93.5	# 4
Skeleton Based Action Recognition	NTU RGB+D	SkateFormer	Ensembled Modalities	4	# 2
Human Interaction Recognition	NTU RGB+D	SkateFormer	Accuracy (Cross-Subject)	97.1	# 1
Human Interaction Recognition	NTU RGB+D	SkateFormer	Accuracy (Cross-View)	99.3	# 1
Skeleton Based Action Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Subject)	89.8	# 9
Skeleton Based Action Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Setup)	91.4	# 5
Skeleton Based Action Recognition	NTU RGB+D 120	SkateFormer	Ensembled Modalities	4	# 1
Human Interaction Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Subject)	92.3	# 1
Human Interaction Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Setup)	93.2	# 1
Skeleton Based Action Recognition	N-UCLA	SkateFormer	Accuracy	98.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skateformer-skeletal-temporal-transformer-for/human-interaction-recognition-on-ntu-rgb-d)](https://paperswithcode.com/sota/human-interaction-recognition-on-ntu-rgb-d?p=skateformer-skeletal-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skateformer-skeletal-temporal-transformer-for/human-interaction-recognition-on-ntu-rgb-d-1)](https://paperswithcode.com/sota/human-interaction-recognition-on-ntu-rgb-d-1?p=skateformer-skeletal-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skateformer-skeletal-temporal-transformer-for/skeleton-based-action-recognition-on-n-ucla)](https://paperswithcode.com/sota/skeleton-based-action-recognition-on-n-ucla?p=skateformer-skeletal-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skateformer-skeletal-temporal-transformer-for/skeleton-based-action-recognition-on-ntu-rgbd)](https://paperswithcode.com/sota/skeleton-based-action-recognition-on-ntu-rgbd?p=skateformer-skeletal-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skateformer-skeletal-temporal-transformer-for/skeleton-based-action-recognition-on-ntu-rgbd-1)](https://paperswithcode.com/sota/skeleton-based-action-recognition-on-ntu-rgbd-1?p=skateformer-skeletal-temporal-transformer-for)`

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

14 Mar 2024 · Jeonghyeok Do, Munchurl Kim ·

Skeleton-based action recognition, which classifies human actions based on the coordinates of joints and their connectivity within skeleton data, is widely utilized in various scenarios. While Graph Convolutional Networks (GCNs) have been proposed for skeleton data represented as graphs, they suffer from limited receptive fields constrained by joint connectivity. To address this limitation, recent advancements have introduced transformer-based methods. However, capturing correlations between all joints in all frames requires substantial memory resources. To alleviate this, we propose a novel approach called Skeletal-Temporal Transformer (SkateFormer) that partitions joints and frames based on different types of skeletal-temporal relation (Skate-Type) and performs skeletal-temporal self-attention (Skate-MSA) within each partition. We categorize the key skeletal-temporal relations for action recognition into a total of four distinct types. These types combine (i) two skeletal relation types based on physically neighboring and distant joints, and (ii) two temporal relation types based on neighboring and distant frames. Through this partition-specific attention strategy, our SkateFormer can selectively focus on key joints and frames crucial for action recognition in an action-adaptive manner with efficient computation. Extensive experiments on various benchmark datasets validate that our SkateFormer outperforms recent state-of-the-art methods.

PDF Abstract

Code

Add Remove Mark official

KAIST-VICLab/SkateFormer official

Tasks

Add Remove

Action Recognition

Human Interaction Recognition

Skeleton Based Action Recognition

Datasets

NTU RGB+D

NTU RGB+D 120 N-UCLA

Results from the Paper

Add Remove

Ranked #1 on Skeleton Based Action Recognition on N-UCLA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Skeleton Based Action Recognition	NTU RGB+D	SkateFormer	Accuracy (CV)	97.8	# 2	Compare
			Accuracy (CS)	93.5	# 4	Compare
			Ensembled Modalities	4	# 2	Compare
Human Interaction Recognition	NTU RGB+D	SkateFormer	Accuracy (Cross-Subject)	97.1	# 1	Compare
Human Interaction Recognition	NTU RGB+D	SkateFormer	Accuracy (Cross-View)	99.3	# 1	Compare
Skeleton Based Action Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Subject)	89.8	# 9	Compare
			Accuracy (Cross-Setup)	91.4	# 5	Compare
			Ensembled Modalities	4	# 1	Compare
Human Interaction Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Subject)	92.3	# 1	Compare
Human Interaction Recognition	NTU RGB+D 120	SkateFormer	Accuracy (Cross-Setup)	93.2	# 1	Compare
Skeleton Based Action Recognition	N-UCLA	SkateFormer	Accuracy	98.3	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • BPE • EfficientNet • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove