Echocardiography, or cardiac ultrasound, is the most widely used and readily available imaging modality to assess cardiac function and structure. Combining portable instrumentation, rapid image acquisition, high temporal resolution, and without the risks of ionizing radiation, echocardiography is one of the most frequently utilized imaging studies in the United States and serves as the backbone of cardiovascular imaging. For diseases ranging from heart failure to valvular heart diseases, echocardiography is both necessary and sufficient to diagnose many cardiovascular diseases. In addition to our deep learning model, we introduce a new large video dataset of echocardiograms for computer vision research. The EchoNet-Dynamic database includes 10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.
9 PAPERS • 2 BENCHMARKS
FMB contains 1500 well-registered infrared and visible image pairs with 14 annotated pixel-level categories. Also, it covers a wide range of pixel variations and various severe environments, e.g., dense fog, heavy rain, and low-light condition. The FMB dataset includes rich scenes under different illumination conditions, so that it enables fusion/segmentation model to improve the generalization ability greatly. We labeled 98.16% of all pixels into 14 different categories including Road, Sidewalk, Building, Traffic Light, Traffic Sign, Vegetation, Sky, Person, Car, Truck, Bus, Motorcycle, Bicycle and Pole, which often appear in real world automatic driving and semantic understanding tasks.
9 PAPERS • 1 BENCHMARK
PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is composed of 2433 one square kilometer-patches in the French metropolitan territory for which sequences of satellite observations are assembled into a four-dimensional spatio-temporal tensor. The dataset contains both semantic and instance annotations, assigning to each pixel a semantic label and an instance id. There is an official 5 fold split provided in the dataset's metadata.
TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches. These images are manually labelled and segmented according to a hierarchical taxonomy to train and evaluate object detection algorithms. The annotations are provided in COCO format.
9 PAPERS • NO BENCHMARKS YET
2-PM Vessel is an open-source volumetric brain vasculature dataset obtained with two-photon microscopy at Focused Ultrasound Lab, at Sunnybrook Research Institute (affiliated with University of Toronto by Dr. Alison Burgess, Charissa Poon and Marc Santos. The dataset contains a total of 12 volumetric stacks consisting of images of mouse brain vasculature and tumour vasculature.
8 PAPERS • NO BENCHMARKS YET
BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV). The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings. Moreover, extensive information is provided, including the patient’s demographic information, type
The RIT-18 dataset was built for the semantic segmentation of remote sensing imagery. It was collected with the Tetracam Micro-MCA6 multispectral imaging sensor flown on-board a DJI-1000 octocopter.
RailSem19 offers 8500 unique images taken from a the ego-perspective of a rail vehicle (trains and trams). Extensive semantic annotations are provided, both geometry-based (rail-relevant polygons, all rails as polylines) and dense label maps with many Cityscapes-compatible road labels. Many frames show areas of intersection between road and rail vehicles (railway crossings, trams driving on city streets). RailSem19 is usefull for rail applications and road applications alike.
The TrashCan dataset is an instance-segmentation dataset of underwater trash. It is comprised of annotated images (7,212 images) which contain observations of trash, ROVs, and a wide variety of undersea flora and fauna. The annotations in this dataset take the format of instance segmentation annotations: bitmaps containing a mask marking which pixels in the image contain each object. The imagery in TrashCan is sourced from the J-EDI (JAMSTEC E-Library of Deep-sea Images) dataset, curated by the Japan Agency of Marine Earth Science and Technology (JAMSTEC).
A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset
8 PAPERS • 2 BENCHMARKS
RoadAnomaly21 is a dataset for anomaly segmentation, the task of identify the image regions containing objects that have never been seen during training. It consists of an evaluation dataset of 100 images with pixel-level annotations. Each image contains at least one anomalous object, e.g. animals or unknown vehicles. The anomalies can appear anywhere in the image and widely differ in size, covering from 0.5% to 40% of the image
7 PAPERS • NO BENCHMARKS YET
SketchyScene is a large-scale dataset of scene sketches to advance research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities of realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition.
SpaceNet 2: Building Detection v2 - is a dataset for building footprint detection in geographically diverse settings from very high resolution satellite images. It contains over 302,701 building footprints, 3/8-band Worldview-3 satellite imagery at 0.3m pixel res., across 5 cities (Rio de Janeiro, Las Vegas, Paris, Shanghai, Khartoum), and covers areas that are both urban and suburban in nature. The dataset was split using 60%/20%/20% for train/test/validation.
7 PAPERS • 1 BENCHMARK
TTPLA is a public dataset which is a collection of aerial images on Transmission Towers (TTs) and Power Lines (PLs). It can be used for detection and segmentation of transmission towers and power lines. It consists of 1,100 images with the resolution of 3,840×2,160 pixels, as well as manually labelled 8,987 instances of TTs and PLs.
A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048×1600 to 5000×3600. Every image in the dataset has been carefully labeled by a professional while keeping the same guidelines as PASCAL VOC 2012 without the void region.
6 PAPERS • 1 BENCHMARK
The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and multi-modal images. The dataset may be used for evaluation of different perception algorithms for segmentation, detection, classification, etc. All scenes were recorded at 20 Hz with a camera resolution of 1024x768 pixels. The data was collected on three different days to have enough variability in lighting conditions as shadows and sun angles play a crucial role in the quality of acquired images. The robot traversed about 4.7 km each day. The dataset creators provide manually annotated pixel-wise ground truth segmentation masks for 6 classes: Obstacle, Trail, Sky, Grass, Vegetation, and Void.
6 PAPERS • 2 BENCHMARKS
OST300 is an outdoor scene dataset with 300 test images of outdoor scenes, and a training set of 7 categories of images with rich textures.
6 PAPERS • NO BENCHMARKS YET
BRATS 2014 is a brain tumor segmentation dataset.
5 PAPERS • 1 BENCHMARK
The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that aptly describe the relationship between the image and the corresponding text. These annotations provide valuable insights into the semantic connection between each pair's visual and textual elements.
5 PAPERS • 2 BENCHMARKS
Cata7 is the first cataract surgical instrument dataset for semantic segmentation. The dataset consists of seven videos while each video records a complete cataract surgery. All videos are from Beijing Tongren Hospital. Each video is split into a sequence of images, where resolution is 1920×1080 pixels. To reduce redundancy, the videos are downsampled from 30 fps to 1 fps. Also, images without surgical instruments are manually removed. Each image is labeled with precise edges and types of surgical instruments. This dataset contains 2,500 images, which are divided into training and test sets. The training set consists of five video sequences and test set consists of two video sequence.
5 PAPERS • NO BENCHMARKS YET
The EntitySeg dataset contains 33,227 images with high-quality mask annotations. Compared with existing dataets, there are three distinct properties in EntitySeg. First, 71.25% and 86.23% of the images are of high resolution with at least 2000px×2000px and 1000px×1000px which is more consistent with current digital imaging trends. Second, the dataset is open-world and is not limited to predefined classes. Third, the mask annotation along the boundaries are more accurate than existing datasets.
IDDA is a large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types.
LabPics Chemistry Dataset
MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.
PerSeg is a dataset for personalized segmentation. The raw images are collect from the training data of subject driven diffusion models: DreamBooth, Textual Inversion, and Custom Diffusion. PerSeg contains 40 objects of various categories in total, including daily necessities, animals, and buildings. Contextualized in different poses or scenes, each object is related with 5∼7 images with our annotated masks.
Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.
The Sunnybrook Cardiac Data (SCD), also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction. Subset of this data set was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009. The whole complete data set is now available in the CAP database with public domain license.
TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera. This dataset addresses the deficiencies of other available in-car cabin datasets in terms of the ambit of labeled classes, recorded scenarios and provided annotations; all at the same time. It consists of an exhaustive list of actions performed while driving and multi-modal labeled images (depth, RGB and IR), with complete annotations for 2D and 3D object detection, instance and semantic segmentation as well as activity annotations for RGB frames. Additional to real recordings, it also contains a synthetic dataset of in-car cabin images with same multi-modality of images and annotations, providing a unique and extremely beneficial combination of synthetic and real data for effectively training cabin monitoring systems and evaluating domain adaptation approaches.
UPLight is an underwater RGB-Polarization multimodal semantic segmentation dataset with 12 typical underwater semantic classes.
Embrapa Wine Grape Instance Segmentation Dataset (WGISD) contains grape clusters properly annotated in 300 images and a novel annotation methodology for segmentation of complex objects in natural images.
Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment!
Research on semantic segmentation of traffic scenes using color and polarization information (including training and testing sets).
Video sequences from a glasshouse environment in Campus Kleinaltendorf(CKA), University of Bonn, captured by PATHoBot, a glasshouse monitoring robot.
4 PAPERS • NO BENCHMARKS YET
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
The Kvasir-SEG dataset includes 196 polyps smaller than 10 mm classified as Paris class 1 sessile or Paris class IIa. We have selected it with the help of expert gastroenterologists. We have released this dataset separately as a subset of Kvasir-SEG. We call this subset Kvasir-Sessile.
4 PAPERS • 1 BENCHMARK
The Middlebury 2001 is a stereo dataset of indoor scenes with multiple handcrafted layouts.
Northumberland Dolphin Dataset 2020 (NDD20) is a challenging image dataset annotated for both coarse and fine-grained instance segmentation and categorisation. This dataset, the first release of the NDD, was created in response to the rapid expansion of computer vision into conservation research and the production of field-deployable systems suited to extreme environmental conditions -- an area with few open source datasets. NDD20 contains a large collection of above and below water images of two different dolphin species for traditional coarse and fine-grained segmentation.
SAMRS is a remote sensing segmentation dataset which provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination.
SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, covering an area of 2,544 sq. km of 3/8 band WorldView-2 imagery (0.5 m pixel res.) across the city of Rio de Janeiro, Brazil. The images are processed as 200m×200m tiles with associated building footprint vectors for training.
4 PAPERS • 2 BENCHMARKS
TAS500 is a semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively.
AeroRIT is a hyperspectral dataset to facilitate aerial hyperspectral scene understanding.
3 PAPERS • NO BENCHMARKS YET
The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.
We design an all-day semantic segmentation benchmark all-day CityScapes. It is the first semantic segmentation benchmark that contains samples from all-day scenarios, i.e., from dawn to night. Our dataset will be made publicly available at [https://isis-data.science.uva.nl/cv/1ADcityscape.zip].
3 PAPERS • 1 BENCHMARK
DeepSportradar is a benchmark suite of computer vision tasks, datasets and benchmarks for automated sport understanding. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided.
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.
The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. It possesses the advantage of rich categories, large coverage, wide distribution, and high-spatial resolution, which well reflects the distributions of real-world ground objects and can benefit to different land cover related studies.
The dataset contains a Video capsule endoscopy dataset for polyp segmentation.
MJU-Waste is an RGBD waste object segmentation dataset that is made public to facilitate future research in this area.