The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,147 PAPERS • 92 BENCHMARKS
Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.
3,323 PAPERS • 54 BENCHMARKS
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
3,219 PAPERS • 141 BENCHMARKS
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
55 PAPERS • NO BENCHMARKS YET
MEIS comprises a total of 2,639 images in the size of 1024 × 768 toward two recording views (Aortic Valve (AV) and Left Ventricle (LV)) with 1,521 (747 in AV + 774 in LV) images for training and 1,118 (559 in AV + 559 in LV) for testing, respectively. Each view must be detected with two objects to calculate the measurement indicators. That is in total with four object classes (two objects in each view): aortic root (AoR) and left atrium (LA) in AV; interventricular septum (IVS) and left ventricular posterior wall (LVPW) in LV. The medical meaning and purpose of each indicator are listed in the following: • AV: LA-Dimension and AoR-Dimension can be measured for calculating different indicators, such as AoR/LA ratio, to examine the state of the aortic valve. • LV: 6 measurements include IVSs, IVSd, LVIDs, LVIDd, LVPWs, and LVPWd. These concerned thicknesses and dimensions in LV recording are used to estimate other cardiac functions through specific medical formulas, including LV mass, LV
1 PAPER • 2 BENCHMARKS