🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

196 dataset results for Object Detection AND Images

TICaM (Time-of-flight In-car Cabin Monitoring)

TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera. This dataset addresses the deficiencies of other available in-car cabin datasets in terms of the ambit of labeled classes, recorded scenarios and provided annotations; all at the same time. It consists of an exhaustive list of actions performed while driving and multi-modal labeled images (depth, RGB and IR), with complete annotations for 2D and 3D object detection, instance and semantic segmentation as well as activity annotations for RGB frames. Additional to real recordings, it also contains a synthetic dataset of in-car cabin images with same multi-modality of images and annotations, providing a unique and extremely beneficial combination of synthetic and real data for effectively training cabin monitoring systems and evaluating domain adaptation approaches.

5 PAPERS • NO BENCHMARKS YET

CropAndWeed Dataset

The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.

4 PAPERS • NO BENCHMARKS YET

GRAZPEDWRI-DX

Digital radiography is widely available and the standard modality in trauma imaging, often enabling to diagnose pediatric wrist fractures. However, image interpretation requires time-consuming specialized training. Due to astonishing progress in computer vision algorithms, automated fracture detection has become a topic of research interest. This paper presents the GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, treated at the Department for Pediatric Surgery of the University Hospital Graz between 2008 and 2018. A total number of 10,643 studies (20,327 images) are made available, typically covering posteroanterior and lateral projections. The dataset is annotated with 74,459 image tags and features 67,771 labeled objects. We de-identified all radiographs and converted the DICOM pixel data to 16-Bit grayscale PNG images. The filenames and the accompanying text files provide basic patient information (age, sex). Several pediatric radiolog

4 PAPERS • 1 BENCHMARK

InsPLAD (Inspection Power Line Asset Dataset)

InsPLAD is a Dataset for Power Line Asset Inspection containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. It contains 17 unique power line assets captured from real-world operating power lines. Some of those assets (five, to be precise) are also annotated regarding their conditions. They present the following defects: corrosion (4 of them), broken/missing cap (1 of them), and bird's nest presence (1 of them).

4 PAPERS • 1 BENCHMARK

MJU-Waste

MJU-Waste is an RGBD waste object segmentation dataset that is made public to facilitate future research in this area.

4 PAPERS • 1 BENCHMARK

Open Images V7

Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.

4 PAPERS • NO BENCHMARKS YET

PIDray

PIDray is a large-scale dataset which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items. The dataset contains 12 categories of prohibited items in 47, 677 X-ray images with high-quality annotated segmentation masks and bounding boxes.

4 PAPERS • NO BENCHMARKS YET

RF100 (Roboflow 100)

The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.

4 PAPERS • 1 BENCHMARK

SOD4SB (Small Object Detection for Spotting Birds)

The Small Object Detection for Spotting Birds (SOD4SB) dataset is a dataset consisting of 39,070 images including 137,121 bird instances. The SOD4SD dataset contains a wide variety of small bird types and a variety of scenes.

4 PAPERS • 2 BENCHMARKS

SpaceNet 1 (SpaceNet 1: Building Detection v1)

SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, covering an area of 2,544 sq. km of 3/8 band WorldView-2 imagery (0.5 m pixel res.) across the city of Rio de Janeiro, Brazil. The images are processed as 200m×200m tiles with associated building footprint vectors for training.

4 PAPERS • 2 BENCHMARKS

VEDAI

VEDAI (Vehicle Detection in Aerial Imagery)

VEDAI is a dataset for Vehicle Detection in Aerial Imagery, provided as a tool to benchmark automatic target recognition algorithms in unconstrained environments. The vehicles contained in the database, in addition of being small, exhibit different variabilities such as multiple orientations, lighting/shadowing changes, specularities or occlusions. Furthermore, each image is available in several spectral bands and resolutions. A precise experimental protocol is also given, ensuring that the experimental results obtained by different people can be properly reproduced and compared. We also give the performance of some baseline algorithms on this dataset, for different settings of these algorithms, to illustrate the difficulties of the task and provide baseline comparisons.

4 PAPERS • 1 BENCHMARK

Aircraft Context Dataset

The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.

3 PAPERS • NO BENCHMARKS YET

FOD-A

FOD in Airports (FOD-A) is an image dataset of FOD, Foreign Object Degris, which consists of 31 object categories and over 30,000 annotation instances. The object categories have been selected based on guidance from prior documentation and related research by the Federal Aviation Administration (FAA).

3 PAPERS • NO BENCHMARKS YET

MUAD (Multiple Uncertainties for Autonomous Driving)

The MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 10,413 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects, and annotations for semantic segmentation, depth estimation, object, and instance detection. Predictive uncertainty estimation is essential for the safe deployment of Deep Neural Networks in real-world autonomous systems and MUAD allows to a better assess the impact of different sources of uncertainty on model performance.

3 PAPERS • NO BENCHMARKS YET

Parcel2D Real

Real-world dataset of ~400 images of cuboid-shaped parcels with full 2D and 3D annotations in the COCO format.

3 PAPERS • NO BENCHMARKS YET

PoPArt

PoPArt (Poses of People in Art: A Data Set for Human Pose Estimation in Digital Art History)

Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos

3 PAPERS • 1 BENCHMARK

RailEye3D Dataset

The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.

3 PAPERS • NO BENCHMARKS YET

Separated COCO

Separated COCO is automatically generated subsets of COCO val dataset, collecting separated objects for a large variety of categories in real images in a scalable manner, where target object segmentation mask is separated into distinct regions by the occluder.

3 PAPERS • 1 BENCHMARK

TimberSeg 1.0

The TimberSeg 1.0 dataset is composed of 220 images showing wood logs in various environments and conditions in Canada. The images are densely annotated with segmentation masks for each log instance, as well as the corresponding bounding box and class label. This dataset aim towards enabling autonomous forestry forwarders, therefore it contains nearly 2500 instances of wood logs from an operators' point-of-view. Images were taken in the forest, near the roadside, in lumberyards and above timber-filled trailers. The logs were annotated considering a grasping perspective, meaning that only the logs above the piles and accessible are segmented.

3 PAPERS • NO BENCHMARKS YET

BdSLImset (Bangladeshi Sign Language Image Dataset)

Bangladeshi Sign Language Image Dataset (BdSLImset) is a dataset that contains images of different Bangladeshi sign letters.

2 PAPERS • NO BENCHMARKS YET

Deep PCB (Deep Printed Circuit Board)

DeepPCB

2 PAPERS • 1 BENCHMARK

Endotect Polyp Segmentation Challenge Dataset

A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.

2 PAPERS • 1 BENCHMARK

Forward-Looking Sonar Marine Debris Datasets

This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used for segmentation or object detection. Applications include training computer vision models for underwater robotics applications.

2 PAPERS • 1 BENCHMARK

Gun Detection Dataset

This is a gun detection dataset with 51K annotated gun images for gun detection and other 51K cropped gun chip images for gun classification collected from a few different sources.

2 PAPERS • NO BENCHMARKS YET

Human-Parts

The Human-Parts dataset is a dataset for human body, face and hand detection with ~15k images. It contains ~106k different annotations, with multiple annotations per image.

2 PAPERS • NO BENCHMARKS YET

INRIA-Horse

The INRIA-Horse dataset consists of 170 horse images and 170 images without horses. All horses in all images are annotated with a bounding-box. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The horses are mostly unoccluded, taken from approximately the side viewpoint, and face the same direction.

2 PAPERS • NO BENCHMARKS YET

Kvasir-Capsule

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 PAPERS • NO BENCHMARKS YET

MSDA (Multi-source domain adaptation dataset for text recognition)

5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images

2 PAPERS • 2 BENCHMARKS

Occluded COCO

Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.

2 PAPERS • 1 BENCHMARK

Parcel3D

Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.

2 PAPERS • NO BENCHMARKS YET

S2TLD (SJTU Small Traffic Light Dataset)

S2TLD is a traffic light dataset, which contains 5,786 images of approximately 1,080 * 1,920 pixels and 720 * 1,280 pixels. It also contains 5 categories (include red, yellow, green, off and wait on) of 1,4130 instances. The scenes cover a decent variety of road scenes and typical: * Busy street scenes inner-city, * Dense stop-and-go traffic * Strong changes in illumination/exposure * Flickering/Fluctuating traffic lights * Multiple visible traffic lights * Image parts that can be confused with traffic lights (e.g. large round tail lights)

2 PAPERS • NO BENCHMARKS YET

SmartCity

SmartCity consists of 50 images in total collected from ten city scenes including office entrance, sidewalk, atrium, shopping mall etc.. Unlike the existing crowd counting datasets with images of hundreds/thousands of pedestrians and nearly all the images being taken outdoors, SmartCity has few pedestrians in images and consists of both outdoor and indoor scenes: the average number of pedestrians is only 7.4 with minimum being 1 and maximum being 14.

2 PAPERS • NO BENCHMARKS YET

TBBR (Thermal Bridges on Building Rooftops)

The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.

2 PAPERS • 2 BENCHMARKS

TNCR Dataset (Table Net Detection and Classification Dataset)

We present TNCR, a new table dataset with varying image quality collected from free open source websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.

2 PAPERS • NO BENCHMARKS YET

ZeroWaste

ZeroWaste is a dataset for automatic waste detection and segmentation. This dataset contains over 1,800 fully segmented video frames collected from a real waste sorting plant along with waste material labels for training and evaluation of the segmentation methods, as well as over 6,000 unlabeled frames that can be further used for semi-supervised and self-supervised learning techniques. ZeroWaste also provides frames of the conveyor belt before and after the sorting process, comprising a novel setup that can be used for weakly-supervised segmentation.

2 PAPERS • NO BENCHMARKS YET

xView3-SAR

Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems---known as ``dark vessels''---is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each.

2 PAPERS • 1 BENCHMARK

360-SOD

360-SOD contains 500 high-resolution equirectangular images.

1 PAPER • NO BENCHMARKS YET

A Dataset of Multispectral Potato Plants Images

The dataset contains aerial agricultural images of a potato field with manual labels of healthy and stressed plant regions. The images were collected with a Parrot Sequoia multispectral camera carried by a 3DR Solo drone flying at an altitude of 3 meters. The dataset consists of RGB images with a resolution of 750×750 pixels, and spectral monochrome red, green, red-edge, and near-infrared images with a resolution of 416×416 pixels, and XML files with annotated bounding boxes of healthy and stressed potato crop.

1 PAPER • 1 BENCHMARK

Apron Dataset

The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.

1 PAPER • NO BENCHMARKS YET

AquaTrash

This dataset contains 369 images of Trash used for deep learning. Each image is manually labelled by our team for accurate detections making a total of 470 bounding boxes. There are total 4 classes {(0: glass), (1:paper), (2:metal), (3:plastic)}

1 PAPER • 1 BENCHMARK

BBBC041 (P. vivax (malaria) infected human blood smears)

P. vivax (malaria) infected human blood smears with bounding box annotations. The data consists of two classes of uninfected cells (RBCs and leukocytes) and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts).

1 PAPER • NO BENCHMARKS YET

COCO Object Detection VIPriors subset

The training and validation data are subsets of the training split of the MS COCO dataset (2017 release, bounding boxes only). The test set is taken from the validation split of the MS COCO dataset.

1 PAPER • NO BENCHMARKS YET

CPPE-5 (Medical Personal Protective Equipment Dataset)

CPPE - 5 (Medical Personal Protective Equipment) is a new challenging dataset with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories.

1 PAPER • 1 BENCHMARK

Cattle

Cattle data set, which was introduced in a paper. We (not the authors) created a train-val-test split.

1 PAPER • NO BENCHMARKS YET

DGTA-Cattle (DeepGTAV-Cattle)

Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the Cattle dataset with almost the same classes.

1 PAPER • NO BENCHMARKS YET

DGTA-SeaDronesSee (DeepGTAV-SeaDronesSee)

1 PAPER • NO BENCHMARKS YET

DGTA-VisDrone (DeepGTAV-VisDrone)

1 PAPER • NO BENCHMARKS YET

Drinking Waste Classification

About the Dataset: 4 classes of drinking waste: Aluminium Cans, Glass bottles, PET (plastic) bottles and HDPE (plastic) Milk bottles. rawimgs - images of 4 classes of waste YOLO_imgs - images of 4 classes of waste with corresponding txt file (annotations for YOLO framework) labels.txt - labels of the classes

1 PAPER • 1 BENCHMARK