TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Multi-Person Pose Estimation	Campus	VTP	PCP3D	96.3	# 12
3D Multi-Person Pose Estimation	Campus	VTP	Mean mAP	80.1	# 1
3D Human Pose Estimation	Panoptic	VTP	Average MPJPE (mm)	17.62	# 4
3D Multi-Person Pose Estimation	Shelf	VTP	PCP3D	97.3	# 13
3D Multi-Person Pose Estimation	Shelf	VTP	MPJPE	56.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vtp-volumetric-transformer-for-multi-view/3d-human-pose-estimation-on-cmu-panoptic)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-cmu-panoptic?p=vtp-volumetric-transformer-for-multi-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vtp-volumetric-transformer-for-multi-view/3d-multi-person-pose-estimation-on-campus)](https://paperswithcode.com/sota/3d-multi-person-pose-estimation-on-campus?p=vtp-volumetric-transformer-for-multi-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vtp-volumetric-transformer-for-multi-view/3d-multi-person-pose-estimation-on-shelf)](https://paperswithcode.com/sota/3d-multi-person-pose-estimation-on-shelf?p=vtp-volumetric-transformer-for-multi-view)`

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

25 May 2022 · Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia ·

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

3D Human Pose Estimation

3D Multi-Person Pose Estimation

3D Pose Estimation

Pose Estimation

Datasets

Panoptic

Results from the Paper

Edit

Ranked #4 on 3D Human Pose Estimation on Panoptic (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Multi-Person Pose Estimation	Campus	VTP	PCP3D	96.3	# 12	Compare
3D Multi-Person Pose Estimation	Campus	VTP	Mean mAP	80.1	# 1	Compare
3D Human Pose Estimation	Panoptic	VTP	Average MPJPE (mm)	17.62	# 4	Compare
3D Multi-Person Pose Estimation	Shelf	VTP	PCP3D	97.3	# 13	Compare
3D Multi-Person Pose Estimation	Shelf	VTP	MPJPE	56.3	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Feedforward Network • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Sparse Sinkhorn Attention • Transformer

Edit Social Preview

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove