The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding images captured by a high-speed motion capture system. There are 4 high-resolution progressive scan cameras to acquire video data at 50 Hz. The dataset contains activities by 11 professional actors in 17 scenarios: discussion, smoking, taking photo, talking on the phone, etc., as well as provides accurate 3D joint positions and high-resolution videos.
734 PAPERS • 16 BENCHMARKS
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
346 PAPERS • 5 BENCHMARKS
AMASS is a large database of human motion unifying different optical marker-based motion capture datasets by representing them within a common framework and parameterization. AMASS is readily useful for animation, visualization, and generating training data for deep learning.
291 PAPERS • 1 BENCHMARK
Extreme Pose Interaction (ExPI) Dataset is a new person interaction dataset of Lindy Hop dancing actions. In Lindy Hop, the two dancers are called leader and follower. The authors recorded two couples of dancers in a multi-camera setup equipped also with a motion-capture system. 16 different actions are performed in ExPI dataset, some by the two couples of dancers, some by only one of the couples. Each action was repeated five times to account for variability. More precisely, for each recorded sequence, ExPI provides: (i) Multi-view videos at 25FPS from all the cameras in the recording setup; (ii) Mocap data (3D position of 18 joints for each person) at 25FPS synchronized with the videos.; (iii) camera calibration information; and (iv) 3D shapes as textured meshes for each frame.
13 PAPERS • 3 BENCHMARKS
The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image sequences of 3D human motion from a realistic game engine. The dataset has clean 3D human pose and camera pose annotations, and large diversity in human appearances, indoor environments, camera views, and human activities.
10 PAPERS • 2 BENCHMARKS
PATS dataset consists of a diverse and large amount of aligned pose, audio and transcripts. With this dataset, we hope to provide a benchmark that would help develop technologies for virtual agents which generate natural and relevant gestures.
9 PAPERS • NO BENCHMARKS YET