Dataset for large-scale yoga pose recognition with 82 classes.
4 PAPERS • NO BENCHMARKS YET
The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.
3 PAPERS • NO BENCHMARKS YET
The ICVL dataset is a hand pose estimation dataset that consists of 330K training frames and 2 testing sequences with each 800 frames. The dataset is collected from 10 different subjects with 16 hand joint annotations for each frame.
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos
3 PAPERS • 1 BENCHMARK
Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle
CORSMAL is a dataset for estimating the position and orientation in 3D (or 6D pose) of an object from a single view. The dataset consists of 138,240 images of rendered hands and forearms holding 48 synthetic objects, split into 3 grasp categories over 30 real backgrounds.
2 PAPERS • NO BENCHMARKS YET
Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.
Dataset page: https://github.com/mosamdabhi/MBW-Data
The MERL-RAV (MERL Reannotation of AFLW with Visibility) Dataset contains over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. The images were annotated by professional labelers, supervised by researchers at Mitsubishi Electric Research Laboratories (MERL).
2 PAPERS • 2 BENCHMARKS
The data includes all movement trajectories extracted from the videos of Parkinson's assessments using Convolutional Pose Machines (CPM) as well as the confidence values from CPM. The dataset also includes ground truth ratings of parkinsonism and dyskinesia severity using the UDysRS, UPDRS, and CAPSIT.
The Poser dataset is a dataset for pose estimation which consists of 1927 training and 418 test images. These images are synthetically generated and tuned to unimodal predictions. The images were generated using the Poser software package.
Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:
The Retinal Microsurgery dataset is a dataset for surgical instrument tracking. It consists of 18 in-vivo sequences, each with 200 frames of resolution 1920 × 1080 pixels. The dataset is further classified into four instrument-dependent subsets. The annotated tool joints are n=3 and semantic classes c=2 (tool and background).
Amateur Drawings is a dataset collected via the public demo of Animated Drawings, containing over 178,000 amateur drawings and corresponding user-accepted character bounding boxes, segmentation masks, and joint location annotations.
1 PAPER • NO BENCHMARKS YET
A large-scale hand pose dataset, collected using a novel capture method.
DensePose-Track is a dataset of videos where selected frames are annotated in the traditional DensePose manner.
Halpe-FullBody is a full body keypoints dataset where each person has annotated 136 keypoints, including 20 for body, 6 for feet, 42 for hands and 68 for face. It is designed for the task of whole body human pose estimation.
The HumanoidRobotPose dataset is a dataset for real-time pose estimation of humanoid robots.
MacaquePose is an animal pose estimation dataset containing pictures of macaque monkeys and manually labeled annotations on them.
1 PAPER • 1 BENCHMARK
SIDOD is a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset) and flying distractors.
The SMOT dataset, Single sequence-Multi Objects Training, is collected to represent a practical scenario of collecting training images of new objects in the real world, i.e. a mobile robot with an RGB-D camera collects a sequence of frames while driving around a table to learning multiple objects and tries to recognize objects in different locations.
This is a pose estimation dataset, consisting of symmetric 3D shapes where multiple orientations are visually indistinguishable. The challenge is to predict all equivalent orientations when only one orientation is paired with each image during training (as is the scenario for most pose estimation datasets). In contrast to most pose estimation datasets, the full set of equivalent orientations is available for evaluation.
Vinegar Fly is a pose estimation dataset for fruit flies.
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.
0 PAPER • NO BENCHMARKS YET
Overview The goal: using simulation data to train neural networks to estimate the pose of a rover's camera with respect to a known target object