3D Human Pose Estimation
309 papers with code • 25 benchmarks • 47 datasets
3D Human Pose Estimation is a computer vision task that involves estimating the 3D positions and orientations of body joints and bones from 2D images or videos. The goal is to reconstruct the 3D pose of a person in real-time, which can be used in a variety of applications, such as virtual reality, human-computer interaction, and motion analysis.
Libraries
Use these libraries to find 3D Human Pose Estimation models and implementationsDatasets
Subtasks
Latest papers
Lester: rotoscope animation through video object segmentation and tracking
This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos.
Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers
Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content.
Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework
However, there is still ample room for improvement as these methods often overlook the exploration of correlation between the 2D and 3D joint-level features.
Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton
To address these two challenges, we propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion and achieves more suitable multi-hypothesis prediction for the current pose benchmark by multi-step refinement with multiple noises.
STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
This method can remarkably improve the smoothness of recovery results from video.
3D-LFM: Lifting Foundation Model
The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision.
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video.
VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data
To the best of our knowledge, \textit{VoxelKP} is the first single-staged, fully sparse network that is specifically designed for addressing the challenging task of 3D keypoint estimation from LiDAR data, achieving state-of-the-art performances.
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.
W-HMR: Human Mesh Recovery in World Space with Weak-supervised Camera Calibration and Orientation Correction
We propose a novel orientation correction module to allow the reconstructed human body to remain normal in world space.