TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	3DPW	INT-2 (ResNet-50)	PA-MPJPE	42	# 19
3D Human Pose Estimation	3DPW	INT-2 (ResNet-50)	MPJPE	75.6	# 45
3D Human Pose Estimation	3DPW	INT-2 (ResNet-50)	MPVPE	87.9	# 35
3D Human Pose Estimation	3DPW	INT-2 (ResNet-50)	Acceleration Error	16.5	# 17
3D Human Pose Estimation	Human3.6M	INT-2 (ResNet-50)	Average MPJPE (mm)	54.9	# 218
3D Human Pose Estimation	Human3.6M	INT-2 (ResNet-50)	PA-MPJPE	38.4	# 39

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/capturing-the-motion-of-every-joint-3d-human/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=capturing-the-motion-of-every-joint-3d-human)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/capturing-the-motion-of-every-joint-3d-human/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=capturing-the-motion-of-every-joint-3d-human)`

Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens

1 Mar 2023 · Sen yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu ·

In this paper we present a novel method to estimate 3D human pose and shape from monocular videos. This task requires directly recovering pixel-alignment 3D human pose and body shape from monocular images or videos, which is challenging due to its inherent ambiguity. To improve precision, existing methods highly rely on the initialized mean pose and shape as prior estimates and parameter regression with an iterative error feedback manner. In addition, video-based approaches model the overall change over the image-level features to temporally enhance the single-frame feature, but fail to capture the rotational motion at the joint level, and cannot guarantee local temporal consistency. To address these issues, we propose a novel Transformer-based model with a design of independent tokens. First, we introduce three types of tokens independent of the image feature: \textit{joint rotation tokens, shape token, and camera token}. By progressively interacting with image features through Transformer layers, these tokens learn to encode the prior knowledge of human 3D joint rotations, body shape, and position information from large-scale data, and are updated to estimate SMPL parameters conditioned on a given image. Second, benefiting from the proposed token-based representation, we further use a temporal model to focus on capturing the rotational temporal information of each joint, which is empirically conducive to preventing large jitters in local parts. Despite being conceptually simple, the proposed method attains superior performances on the 3DPW and Human3.6M datasets. Using ResNet-50 and Transformer architectures, it obtains 42.0 mm error on the PA-MPJPE metric of the challenging 3DPW, outperforming state-of-the-art counterparts by a large margin. Code will be publicly available at https://github.com/yangsenius/INT_HMR_Model

PDF Abstract

Code

Add Remove Mark official

yangsenius/int_hmr_model official

Tasks

Add Remove

3D human pose and shape estimation

3D Human Pose Estimation

Pose Estimation

Datasets

Human3.6M

3DPW

Results from the Paper

Edit

Ranked #35 on 3D Human Pose Estimation on 3DPW

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	3DPW	INT-2 (ResNet-50)	PA-MPJPE	42	# 19	Compare
			MPJPE	75.6	# 45	Compare
			MPVPE	87.9	# 35	Compare
			Acceleration Error	16.5	# 17	Compare
3D Human Pose Estimation	Human3.6M	INT-2 (ResNet-50)	Average MPJPE (mm)	54.9	# 218	Compare
3D Human Pose Estimation	Human3.6M	INT-2 (ResNet-50)	PA-MPJPE	38.4	# 39	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • fail • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove