TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	3DPW	IVT (f=5)	PA-MPJPE	46	# 42
3D Human Pose Estimation	Human3.6M	IVT (f=5)	Average MPJPE (mm)	40.2	# 73
3D Human Pose Estimation	Human3.6M	IVT (f=5)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	IVT (f=5)	Multi-View or Monocular	Monocular	# 1
3D Multi-Person Pose Estimation	Panoptic	IVT (f=5)	Average MPJPE (mm)	48.4	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ivt-an-end-to-end-instance-guided-video/3d-multi-person-pose-estimation-on-cmu)](https://paperswithcode.com/sota/3d-multi-person-pose-estimation-on-cmu?p=ivt-an-end-to-end-instance-guided-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ivt-an-end-to-end-instance-guided-video/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=ivt-an-end-to-end-instance-guided-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ivt-an-end-to-end-instance-guided-video/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=ivt-an-end-to-end-instance-guided-video)`

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

6 Aug 2022 · Zhongwei Qiu, Qiansheng Yang, Jian Wang, Dongmei Fu ·

Video 3D human pose estimation aims to localize the 3D coordinates of human joints from videos. Recent transformer-based approaches focus on capturing the spatiotemporal information from sequential 2D poses, which cannot model the contextual depth feature effectively since the visual depth features are lost in the step of 2D pose estimation. In this paper, we simplify the paradigm into an end-to-end framework, Instance-guided Video Transformer (IVT), which enables learning spatiotemporal contextual depth information from visual features effectively and predicts 3D poses directly from video frames. In particular, we firstly formulate video frames as a series of instance-guided tokens and each token is in charge of predicting the 3D pose of a human instance. These tokens contain body structure information since they are extracted by the guidance of joint offsets from the human center to the corresponding body joints. Then, these tokens are sent into IVT for learning spatiotemporal contextual depth. In addition, we propose a cross-scale instance-guided attention mechanism to handle the variational scales among multiple persons. Finally, the 3D poses of each person are decoded from instance-guided tokens by coordinate regression. Experiments on three widely-used 3D pose estimation benchmarks show that the proposed IVT achieves state-of-the-art performances.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

2D Pose Estimation

3D Human Pose Estimation

3D Multi-Person Pose Estimation

3D Pose Estimation

Pose Estimation

Datasets

Human3.6M

3DPW

Panoptic

Results from the Paper

Edit

Ranked #10 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	3DPW	IVT (f=5)	PA-MPJPE	46	# 42	Compare
3D Human Pose Estimation	Human3.6M	IVT (f=5)	Average MPJPE (mm)	40.2	# 73	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Multi-Person Pose Estimation	Panoptic	IVT (f=5)	Average MPJPE (mm)	48.4	# 10	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove