UNIPD-BPE (University of Padova Body Pose Estimation)

The University of Padova Body Pose Estimation dataset (UNIPD-BPE) is an extensive dataset for multi-sensor body pose estimation containing both single-person and multi-person sequences with up to 4 interacting people A network with 5 Microsoft Azure Kinect RGB-D cameras is exploited to record synchronized high-definition RGB and depth data of the scene from multiple viewpoints, as well as to estimate the subjects’ poses using the Azure Kinect Body Tracking SDK. Simultaneously, full-body Xsens MVN Awinda inertial suits allow obtaining accurate poses and anatomical joint angles, while also providing raw data from the 17 IMUs required by each suit. All the cameras and inertial suits are hardware synchronized, while the relative poses of each camera with respect to the inertial reference frame are calibrated before each sequence to ensure maximum overlap of the two sensing systems outputs.

The setup used allowed to record synchronized 3D poses of the persons on the scene both via Xsens’ inverse kinematics algorithm (inertial motion capture) and by exploiting the Azure Kinect Body tracking SDK (markerless motion capture), simultaneously. The additional raw data (RGB, depth, camera network configuration) allow the user to assess the performance of any custom markerless motion capture algorithm (based on RGB, depth, or both). Further analyses can be progressed by varying the number of cameras being used and/or their resolution and frame rate. Moreover, raw angular velocities, linear accelerations, magnetic fields, and orientations from each IMU allow to develop and test multimodal BPE approaches focused on merging visual and inertial data. Finally, the precise body dimensions of each subject are provided. They include body height, weight, and segment lengths measured before the beginning of a recording session. They were used to scale the Xsens biomechanical model, and also constitute a ground truth for assessing the markerless BPE accuracy on estimating each subject’s body dimensions.

The recorded sequences include 15 participants performing a set of 12 ADLs (e.g., walking, sitting, and jogging). The actions were chosen to present different challenges to BPE algorithms, including different movement speeds, self-occlusions, and complex body poses. Moreover, multi-person sequences, with up to 4 people performing a set of 7 different actions, are provided. Such sequences offer challenging scenarios where multiple self-occluded persons move and interact in a restricted space. They allow assessing the accuracy of multi-person tracking algorithms, focused on maintaining frame-by-frame consistent IDs of each detected person.

A total of 13.3h of RGB, depth, and markerless BPE data are present in the dataset, corresponding to over 1,400,000 frames obtained from a calibrated network with 5 RGB-D cameras. The inertial suits, on the other hand, allowed to record 3h of inertial motion capture data, corresponding to a total of over 600,000 frames recorded by each of the 17 IMUs used by every suit.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • CC0

Modalities


Languages