TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	3DPW	XFormer (HRNet)	PA-MPJPE	45.7	# 39
3D Human Pose Estimation	3DPW	XFormer (HRNet)	MPJPE	75	# 41
3D Human Pose Estimation	3DPW	XFormer (HRNet)	MPVPE	87.1	# 31
3D Human Pose Estimation	Human3.6M	XFormer (HRNet)	Average MPJPE (mm)	52.6	# 197
3D Human Pose Estimation	Human3.6M	XFormer (HRNet)	PA-MPJPE	35.2	# 26
3D Human Pose Estimation	MPI-INF-3DHP	XFormer (HRNet)	MPJPE	109.8	# 76
3D Human Pose Estimation	MPI-INF-3DHP	XFormer (HRNet)	PA-MPJPE	64.5	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/xformer-fast-and-accurate-monocular-3d-body/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=xformer-fast-and-accurate-monocular-3d-body)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/xformer-fast-and-accurate-monocular-3d-body/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=xformer-fast-and-accurate-monocular-3d-body)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/xformer-fast-and-accurate-monocular-3d-body/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=xformer-fast-and-accurate-monocular-3d-body)`

XFormer: Fast and Accurate Monocular 3D Body Capture

18 May 2023 · Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, Zhe Lin, Cheng-Bin Jin ·

We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: a keypoint branch that estimates 3D human mesh vertices given 2D keypoints, and an image branch that makes predictions directly from the RGB image features. At the core of our method is a cross-modal transformer block that allows information to flow across these two branches by modeling the attention between 2D keypoint coordinates and image spatial features. Our architecture is smartly designed, which enables us to train on various types of datasets including images with 2D/3D annotations, images with 3D pseudo labels, and motion capture datasets that do not have associated images. This effectively improves the accuracy and generalization ability of our system. Built on a lightweight backbone (MobileNetV3), our method runs blazing fast (over 30fps on a single CPU core) and still yields competitive accuracy. Furthermore, with an HRNet backbone, XFormer delivers state-of-the-art performance on Huamn3.6 and 3DPW datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

3D Human Pose Estimation

Datasets

MS COCO

Human3.6M

MPII

3DPW

AMASS

MPI-INF-3DHP

MuCo-3DHP

Results from the Paper

Edit

Ranked #31 on 3D Human Pose Estimation on 3DPW

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	3DPW	XFormer (HRNet)	PA-MPJPE	45.7	# 39	Compare
			MPJPE	75	# 41	Compare
			MPVPE	87.1	# 31	Compare
3D Human Pose Estimation	Human3.6M	XFormer (HRNet)	Average MPJPE (mm)	52.6	# 197	Compare
3D Human Pose Estimation	Human3.6M	XFormer (HRNet)	PA-MPJPE	35.2	# 26	Compare
3D Human Pose Estimation	MPI-INF-3DHP	XFormer (HRNet)	MPJPE	109.8	# 76	Compare
3D Human Pose Estimation	MPI-INF-3DHP	XFormer (HRNet)	PA-MPJPE	64.5	# 16	Compare

Methods

Add Remove

Batch Normalization • Convolution • HRNet • ReLU • Residual Connection

Edit Social Preview

XFormer: Fast and Accurate Monocular 3D Body Capture

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove