TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
3D Action Recognition	NTU RGB+D	P4Transformer	Cross Subject Accuracy	90.2	# 4
3D Action Recognition	NTU RGB+D	P4Transformer	Cross View Accuracy	96.4	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/point-4d-transformer-networks-for-spatio/3d-action-recognition-on-ntu-rgb-d-1)](https://paperswithcode.com/sota/3d-action-recognition-on-ntu-rgb-d-1?p=point-4d-transformer-networks-for-spatio)`

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

CVPR 2021 · Hehe Fan, Yi Yang, Mohan Kankanhalli ·

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking. Extensive experiments, including 3D action recognition and 4D semantic segmentation, on four benchmarks demonstrate the effectiveness of our P4Transformer for point cloud video modeling.

PDF Abstract

Code

Add Remove Mark official

hehefan/P4Transformer official

159

Tasks

Add Remove

3D Action Recognition

Action Recognition

Point Tracking

Semantic Segmentation

Datasets

SYNTHIA

NTU RGB+D

Results from the Paper

Add Remove

Ranked #4 on 3D Action Recognition on NTU RGB+D

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Action Recognition	NTU RGB+D	P4Transformer	Cross Subject Accuracy	90.2	# 4	Compare
3D Action Recognition	NTU RGB+D	P4Transformer	Cross View Accuracy	96.4	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove