🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language (clear)

64 dataset results for Semantic Segmentation AND English

MS COCO (Microsoft Common Objects in Context)

The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

10,330 PAPERS • 93 BENCHMARKS

KITTI

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome

3,264 PAPERS • 141 BENCHMARKS

DAVIS (Densely Annotated VIdeo Segmentation)

The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p. There are 50 video sequences with 3455 densely annotated frames in pixel level. 30 videos with 2079 frames are for training and 20 videos with 1376 frames are for validation.

644 PAPERS • 13 BENCHMARKS

Kvasir-SEG

Kvasir-SEG is an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist.

147 PAPERS • 3 BENCHMARKS

PASCAL VOC 2012 test

SCC Data Set

109 PAPERS • 3 BENCHMARKS

LIP (Look into Person)

The LIP (Look into Person) dataset is a large-scale dataset focusing on semantic understanding of a person. It contains 50,000 images with elaborated pixel-wise annotations of 19 semantic human part labels and 2D human poses with 16 key points. The images are collected from real-world scenarios and the subjects appear with challenging poses and view, heavy occlusions, various appearances and low resolution.

59 PAPERS • 1 BENCHMARK

LoveDA (Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation)

5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan Focus on different geographical environments between Urban and Rural Advance both semantic segmentation and domain adaptation tasks Three considerable challenges: Multi-scale objects Complex background samples Inconsistent class distributions

49 PAPERS • 1 BENCHMARK

GID (Gaofen Image Dataset)

Gaofen Image Dataset (GID) is a large-scale land-cover dataset constructed with Gaofen-2 (GF-2) satellite images. This dataset has superiorities over the existing land-cover dataset because of its large coverage, wide distribution, and high spatial resolution. It contains 150 GF-2 images annotated at the pixel level for 5 categories: built-up, farmland, forest, meadow, and water.

24 PAPERS • NO BENCHMARKS YET

DigestPath

Introduced by Da et al. in DigestPath: a Benchmark Dataset with Challenge Review for the Pathological Detection and Segmentation of Digestive-System

22 PAPERS • 1 BENCHMARK

Kvasir-Instrument

Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, the dataset includes ground truth masks and bounding boxes and has been verified by two expert GI endoscopists.

15 PAPERS • 3 BENCHMARKS

DIVA-HisDB

The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies.

14 PAPERS • 2 BENCHMARKS

FoodSeg103

FoodSeg103 is a new food image dataset containing 7,118 images. Images are annotated with 104 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. It's provided as a large-scale benchmark for food image segmentation.

14 PAPERS • 1 BENCHMARK

MCubeS

MCubeS (Multimodal Material Segmentation Dataset)

Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four modalities: RGB, angle of linear polarization (AoLP), degree of linear polarization (DoLP), and near-infrared (NIR). The dataset provides annotated ground truth labels for both material and semantic segmentation for every pixel. The dataset is divided training set with 302 image sets, validation set with 96 image sets, and test set with 102 image sets. Each image has 1224 x 1024 pixels and a total of 20 class labels per pixel.

11 PAPERS • 1 BENCHMARK

FMB Dataset

FMB Dataset (Full-time Multi-modality Benchmark Dataset)

FMB contains 1500 well-registered infrared and visible image pairs with 14 annotated pixel-level categories. Also, it covers a wide range of pixel variations and various severe environments, e.g., dense fog, heavy rain, and low-light condition. The FMB dataset includes rich scenes under different illumination conditions, so that it enables fusion/segmentation model to improve the generalization ability greatly. We labeled 98.16% of all pixels into 14 different categories including Road, Sidewalk, Building, Traffic Light, Traffic Sign, Vegetation, Sky, Person, Car, Truck, Bus, Motorcycle, Bicycle and Pole, which often appear in real world automatic driving and semantic understanding tasks.

10 PAPERS • 1 BENCHMARK

BIMCV COVID-19

BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV). The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings. Moreover, extensive information is provided, including the patient’s demographic information, type

9 PAPERS • NO BENCHMARKS YET

WaterScenes

A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset

8 PAPERS • 2 BENCHMARKS

Satlas

Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.

7 PAPERS • NO BENCHMARKS YET

SpaceNet 2 (SpaceNet 2: Building Detection v2)

SpaceNet 2: Building Detection v2 - is a dataset for building footprint detection in geographically diverse settings from very high resolution satellite images. It contains over 302,701 building footprints, 3/8-band Worldview-3 satellite imagery at 0.3m pixel res., across 5 cities (Rio de Janeiro, Las Vegas, Paris, Shanghai, Khartoum), and covers areas that are both urban and suburban in nature. The dataset was split using 60%/20%/20% for train/test/validation.

7 PAPERS • 1 BENCHMARK

CC3M-TagMask

The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that aptly describe the relationship between the image and the corresponding text. These annotations provide valuable insights into the semantic connection between each pair's visual and textual elements.

5 PAPERS • 2 BENCHMARKS

Kvasir-Sessile dataset (Sessile polyps from Kvasir-SEG)

The Kvasir-SEG dataset includes 196 polyps smaller than 10 mm classified as Paris class 1 sessile or Paris class IIa. We have selected it with the help of expert gastroenterologists. We have released this dataset separately as a subset of Kvasir-SEG. We call this subset Kvasir-Sessile.

5 PAPERS • 1 BENCHMARK

LabPics (LabPics Dataset for computer vision for autonomous chemistry labs and medical labs)

LabPics Chemistry Dataset

5 PAPERS • NO BENCHMARKS YET

PhenoBench (PhenoBench — A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain)

The PhenoBench dataset contains multiple image segmentation challenges from the agricultural domain.

5 PAPERS • NO BENCHMARKS YET

Sunnybrook Cardiac Data

The Sunnybrook Cardiac Data (SCD), also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction. Subset of this data set was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009. The whole complete data set is now available in the CAP database with public domain license.

5 PAPERS • NO BENCHMARKS YET

XImageNet-12 (XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation)

Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment!

5 PAPERS • 1 BENCHMARK

CropAndWeed Dataset

The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.

4 PAPERS • NO BENCHMARKS YET

Open Images V7

Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.

4 PAPERS • NO BENCHMARKS YET

PASTIS-R

PASTIS-R (Panoptic Segmentation of Radar and Optical Satellite image TIme Series)

Extension of the PASTIS benchmark with radar and optical image time series.

4 PAPERS • 2 BENCHMARKS

SpaceNet 1 (SpaceNet 1: Building Detection v1)

SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, covering an area of 2,544 sq. km of 3/8 band WorldView-2 imagery (0.5 m pixel res.) across the city of Rio de Janeiro, Brazil. The images are processed as 200m×200m tiles with associated building footprint vectors for training.

4 PAPERS • 2 BENCHMARKS

Aircraft Context Dataset

The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.

3 PAPERS • NO BENCHMARKS YET

Five-Billion-Pixels

The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. It possesses the advantage of rich categories, large coverage, wide distribution, and high-spatial resolution, which well reflects the distributions of real-world ground objects and can benefit to different land cover related studies.

3 PAPERS • NO BENCHMARKS YET

KvasirCapsule-SEG

The dataset contains a Video capsule endoscopy dataset for polyp segmentation.

3 PAPERS • 1 BENCHMARK

Medico automatic polyp segmentation challenge (dataset)

The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.

3 PAPERS • 1 BENCHMARK

OpenTTGames

OSAI introduces OpenTTGames - an open dataset aimed at evaluation of different computer vision tasks in Table Tennis: ball detection, semantic segmentation of humans, table and scoreboard and fast in-game events spotting.

3 PAPERS • NO BENCHMARKS YET

PETRAW

PETRAW (PEg TRAnsfer Workflow recognition by different modalities)

PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.

3 PAPERS • 6 BENCHMARKS

Endotect Polyp Segmentation Challenge Dataset

A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.

2 PAPERS • 1 BENCHMARK

HERA RFI Detection

HERA RFI Detection (Hydrogen Epoch of Reionization Array (HERA))

This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South Africa and the Low-Frequency Array (LOFAR) in the Netherlands. These datasets are intended to test radio-frequency interference (RFI) detection schemes. This entry pertains to the HERA dataset specifically.

2 PAPERS • 1 BENCHMARK

HuTics (Human Deictic Gestures Dataset)

HuTics contains 2040 images showing how humans use deictic gestures to interact with various daily-life objects. The images are annotated by segmentation masks of the object(s) of interest. The original purpose of the data collection is for gesture-aware object-agnostic segmentation tasks.

2 PAPERS • NO BENCHMARKS YET

LOFAR RFI Detection

LOFAR RFI Detection (Low-Frequency Array (LOFAR) Radio Frequency Interference Detection)

2 PAPERS • 1 BENCHMARK

Mila Simulated Floods

Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and rural areas.

2 PAPERS • 1 BENCHMARK

NVGaze (NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation)

Quality, diversity, and size of training dataset are critical factors for learning-based gaze estimators. We create two datasets satisfying these criteria for near-eye gaze estimation under infrared illumination: a synthetic dataset using anatomically-informed eye and face models with variations in face shape, gaze direction, pupil and iris, skin tone, and external conditions (two million images at 1280x960), and a real-world dataset collected with 35 subjects (2.5 million images at 640x480). Using our datasets, we train a neural network for gaze estimation, achieving 2.06 (+/- 0.44) degrees of accuracy across a wide 30 x 40 degrees field of view on real subjects excluded from training and 0.5 degrees best-case accuracy (across the same field of view) when explicitly trained for one real subject. We also train a variant of our network to perform pupil estimation, showing higher robustness than previous methods. Our network requires fewer convolutional layers than previous networks, ach

2 PAPERS • NO BENCHMARKS YET

OADAT

OADAT (OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing)

An experimental and synthetic (simulated) OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries.

2 PAPERS • NO BENCHMARKS YET

25kTrees (Individual Tree Crown Annotations)

Manual crown delineation of individual trees in two countries: Denmark and Finland.

1 PAPER • NO BENCHMARKS YET

BCSD

BCSD (Bank Check Segmentation Dataset)

The dataset consists of images of 158 filled out bank checks containing various complex backgrounds, and handwritten text and signatures in the respective fields, along with both pixel-level and patch-level segmentation masks for the signatures on the checks. Please visit the dataset homepage for more details.

1 PAPER • NO BENCHMARKS YET

CheXlocalize

CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points. Annotations were drawn on images from the CheXpert validation and test sets. The dataset also consists of two separate sets of radiologist annotations: (1) ground-truth pixel-level segmentations on the validation and test sets, drawn by two board-certified radiologists, and (2) benchmark pixel-level segmentations and most-representative points on the test set, drawn by a separate group of three board-certified radiologists.

1 PAPER • NO BENCHMARKS YET

EBHI-Seg

EBHI-Seg is a dataset containing 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer.

1 PAPER • NO BENCHMARKS YET

EarthVQA (A multi-modal multi-task VQA dataset for remote sensing)

Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.

1 PAPER • 1 BENCHMARK

FracAtlas (A Dataset for Fracture Classification, Localization and Segmentation of Musculoskeletal Radiographs)

FractureAtlas is a musculoskeletal bone fracture dataset with annotations for deep learning tasks like classification, localization, and segmentation. The dataset contains a total of 4,083 X-Ray images with annotation in COCO, VGG, YOLO, and Pascal VOC format. This dataset is made freely available for any purpose. The data provided within this work are free to copy, share or redistribute in any medium or format. The data might be adapted, remixed, transformed, and built upon. The dataset is licensed under a CC-BY 4.0 license. It should be noted that to use the dataset correctly, one needs to have knowledge of medical and radiology fields to understand the results and make conclusions based on the dataset. It's also important to consider the possibility of labeling errors.

1 PAPER • NO BENCHMARKS YET

GUISS dataset

GUISS dataset (Meshes, textures, Blend files, stereo datasets, depth maps, depth estimations))

We provide all the expected data inputs to GUISS such as meshes, texture images, and blend files. Generated datasets used in our experiments along with the stereo depth estimations can be downloaded. We have defined seven dataset types: scene_reconstructions, texture_variation, gaea_texture_variation, generative_texture, terrain_variation, rocks, and generative_texture_snow. Each dataset type contains renderings with varying values of different parameters such as lighting angle, texture imgs, albedo, etc. Position each dataset type folder under data/dataset/.

1 PAPER • NO BENCHMARKS YET

Datasets

64 dataset results for Semantic Segmentation AND English