The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,185 PAPERS • 93 BENCHMARKS
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
340 PAPERS • 5 BENCHMARKS
Animal Kingdom is a large and diverse dataset that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors. The wild animal footage used in the dataset records different times of the day in an extensive range of environments containing variations in backgrounds, viewpoints, illumination and weather conditions. More specifically, the dataset contains 50 hours of annotated videos to localize relevant animal behavior segments in long videos for the video grounding task, 30K video sequences for the fine-grained multi-label action recognition task, and 33K frames for the pose estimation task, which correspond to a diverse range of animals with 850 species across 6 major animal classes.
14 PAPERS • 2 BENCHMARKS
The DREAM dataset is introduce by the paper "Camera-to-Robot Pose Estimation from a Single Image" (ICRA 2020). This dataset consists of synthetic images (both with and without domain randomlization) of three different robot manipulators (Franka Emika’s Panda, Kuka’s LBR iiwa 7 R800, and Rethink Robotics’ Baxter) , as well as real-world images of Franka Emika’s Panda taken from various RGBD cameras (XBox 360 Kinect (XK), RealSense (RS), and Azure Kinect (AK)). Each instance in the dataset contains an RGB image, keypoint 3D/2D coordinates , global camera-to-robot transformation and joint state configurations (from both revolute and prismatic joint) of the robot. Tasks like estimating robot pose (camera pose) from a single RGB image, camera-to-robot calibration can be conducted and evaluated in this dataset.
5 PAPERS • 1 BENCHMARK
The Few-Shot Object Learning (FewSOL) dataset can be used for object recognition with a few images per object. It contains 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. FewSOL dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition.
4 PAPERS • NO BENCHMARKS YET
A new dataset with significant occlusions related to object manipulation.
The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.
3 PAPERS • NO BENCHMARKS YET
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos
3 PAPERS • 1 BENCHMARK
Overview The goal: using simulation data to train neural networks to estimate the pose of a rover's camera with respect to a known target object
0 PAPER • NO BENCHMARKS YET