🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

196 dataset results for Object Detection AND Images

The Sku110k dataset provides 11,762 images with more than 1.7 million annotated bounding boxes captured in densely packed scenarios, including 8,233 images for training, 588 images for validation, and 2,941 images for testing. There are around 1,733,678 instances in total. The images are collected from thousands of supermarket stores and are of various scales, viewing angles, lighting conditions, and noise levels. All the images are resized into a resolution of one megapixel. Most of the instances in the dataset are tightly packed and typically of a certain orientation in the rage of [−15∘, 15∘].

20 PAPERS • 1 BENCHMARK

ModaNet

ModaNet is a street fashion images dataset consisting of annotations related to RGB images. ModaNet provides multiple polygon annotations for each image. Each polygon is associated with a label from 13 meta fashion categories. The annotations are based on images in the PaperDoll image set, which has only a few hundred images annotated by the superpixel-based tool.

19 PAPERS • 1 BENCHMARK

RADIATE (RAdar Dataset In Adverse weaThEr)

RADIATE (RAdar Dataset In Adverse weaThEr) is new automotive dataset created by Heriot-Watt University which includes Radar, Lidar, Stereo Camera and GPS/IMU. The data is collected in different weather scenarios (sunny, overcast, night, fog, rain and snow) to help the research community to develop new methods of vehicle perception. The radar images are annotated in 7 different scenarios: Sunny (Parked), Sunny/Overcast (Urban), Overcast (Motorway), Night (Motorway), Rain (Suburban), Fog (Suburban) and Snow (Suburban). The dataset contains 8 different types of objects (car, van, truck, bus, motorbike, bicycle, pedestrian and group of pedestrians).

19 PAPERS • 2 BENCHMARKS

TinyPerson

TinyPerson is a benchmark for tiny object detection in a long distance and with massive backgrounds. The images in TinyPerson are collected from the Internet. First, videos with a high resolution are collected from different websites. Second, images from the video are sampled every 50 frames. Then images with a certain repetition (homogeneity) are deleted, and the resulting images are annotated with 72,651 objects with bounding boxes by hand.

19 PAPERS • NO BENCHMARKS YET

Argoverse-HD

Argoverse-HD is a dataset built for streaming object detection, which encompasses real-time object detection, video object detection, tracking, and short-term forecasting. It contains the video data from Argoverse 1.1 with our own MS COCO-style bounding box annotations with track IDs. The annotations are backward-compatible with COCO as one can directly evaluate COCO pre-trained models on this dataset to estimate the efficiency or the cross-dataset generalization capability of the models. The dataset contains high-quality and temporally-dense annotations for high-resolution videos (1920 x 1200 @ 30 FPS). Overall, there are 70,000 image frames and 1.3 million bounding boxes.

17 PAPERS • 4 BENCHMARKS

OpenImages-v6

OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes

17 PAPERS • 3 BENCHMARKS

SceneNet

SceneNet is a dataset of labelled synthetic indoor scenes. There are several labeled indoor scenes, including:

17 PAPERS • NO BENCHMARKS YET

EORSSD (Extended Optical Remote Sensing Saliency Detection)

The Extended Optical Remote Sensing Saliency Detection (EORSSD) dataset is an extension of the ORSSD dataset. This new dataset is larger and more varied than the original. It contains 2,000 images and corresponding pixel-wise ground truth, which includes many semantically meaningful but challenging images.

16 PAPERS • NO BENCHMARKS YET

SeaDronesSee (SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water)

SeaDronesSee is a large-scale data set aimed at helping develop systems for Search and Rescue (SAR) using Unmanned Aerial Vehicles (UAVs) in maritime scenarios. Building highly complex autonomous UAV systems that aid in SAR missions requires robust computer vision algorithms to detect and track objects or persons of interest. This data set provides three sets of tracks: object detection, single-object tracking and multi-object tracking. Each track consists of its own data set and leaderboard.

16 PAPERS • 3 BENCHMARKS

MALF (Multi-Attribute Labelled Faces)

The MALF dataset is a large dataset with 5,250 images annotated with multiple facial attributes and it is specifically constructed for fine grained evaluation.

15 PAPERS • NO BENCHMARKS YET

RPC (Retail Product Checkout)

RPC is a large-scale retail product checkout dataset and collects 200 retail SKUs. The collected SKUs can be divided into 17 meta categories, i.e., puffed food, dried fruit, dried food, instant drink, instant noodles, dessert, drink, alcohol, milk, canned food, chocolate, gum, candy, seasoner, personal hygiene, tissue, stationery.

15 PAPERS • NO BENCHMARKS YET

Washington RGB-D

Washington RGB-D is a widely used testbed in the robotic community, consisting of 41,877 RGB-D images organized into 300 instances divided in 51 classes of common indoor objects (e.g. scissors, cereal box, keyboard etc). Each object instance was positioned on a turntable and captured from three different viewpoints while rotating.

15 PAPERS • NO BENCHMARKS YET

MinneApple

MinneApple is a benchmark dataset for apple detection and segmentation. The fruits are labelled using polygonal masks for each object instance to aid in precise object detection, localization, and segmentation. Additionally, the dataset also contains data for patch-based counting of clustered fruits. The dataset contains over 41, 000 annotated object instances in 1000 images.

14 PAPERS • NO BENCHMARKS YET

UFDD (Unconstrained Face Detection Dataset)

Unconstrained Face Detection Dataset (UFDD) aims to fuel further research in unconstrained face detection.

13 PAPERS • NO BENCHMARKS YET

PeopleArt

People-Art is an object detection dataset which consists of people in 43 different styles. People contained in this dataset are quite different from those in common photographs. There are 42 categories of art styles and movements including Naturalism, Cubism, Socialist Realism, Impressionism, and Suprematism

11 PAPERS • 2 BENCHMARKS

SODA10M

SODA10M is a large-scale object detection benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data. SODA10M contains 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.

11 PAPERS • NO BENCHMARKS YET

TJU-DHD

TJU-DHD is a high-resolution dataset for object detection and pedestrian detection. The dataset contains 115,354 high-resolution images (52% images have a resolution of 1624×1200 pixels and 48% images have a resolution of at least 2,560×1,440 pixels) and 709,330 labelled objects in total with a large variance in scale and appearance.

11 PAPERS • 2 BENCHMARKS

AI-TOD

AI-TOD (Tiny Object Detection in Aerial Images)

AI-TOD comes with 700,621 object instances for eight categories across 28,036 aerial images. Compared to existing object detection datasets in aerial images, the mean size of objects in AI-TOD is about 12.8 pixels, which is much smaller than others.

10 PAPERS • 1 BENCHMARK

DeepScores

DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. For advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding.

10 PAPERS • NO BENCHMARKS YET

Hyper-Kvasir Dataset

HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total of around 1 million images and video frames altogether.

10 PAPERS • 2 BENCHMARKS

WiderPerson

WiderPerson contains a total of 13,382 images with 399,786 annotations, i.e., 29.87 annotations per image, which means this dataset contains dense pedestrians with various kinds of occlusions. Hence, pedestrians in the proposed dataset are extremely challenging due to large variations in the scenario and occlusion, which is suitable to evaluate pedestrian detectors in the wild.

9 PAPERS • 1 BENCHMARK

Description Detection Dataset

Description Detection Dataset ($D^3$, /dikju:b/) is an attempt at creating a next-generation object detection dataset. Unlike traditional detection datasets, the class names of the objects are no longer simple nouns or noun phrases, but rather complex and descriptive, such as a dog not being held by a leash. For each image in the dataset, any object that matches the description is annotated. The dataset provides annotations such as bounding boxes and finely crafted instance masks.It comprises of 422 well-designed descriptions and 24,282 positive object-description pairs.

8 PAPERS • 1 BENCHMARK

RIT-18

The RIT-18 dataset was built for the semantic segmentation of remote sensing imagery. It was collected with the Tetracam Micro-MCA6 multispectral imaging sensor flown on-board a DJI-1000 octocopter.

8 PAPERS • NO BENCHMARKS YET

ReDWeb-S

ReDWeb-S is a large-scale challenging dataset for Salient Object Detection. It has totally 3179 images with various real-world scenes and high-quality depth maps. The dataset is split into a training set with 2179 RGB-D image pairs and a testing set with the remaining 1000 image pairs.

8 PAPERS • NO BENCHMARKS YET

SKU110K-R

SKU110K-R is a dataset relabeled with oriented bounding boxes based on SKU110K. It is focused on evaluating oriented and densely packed object detection.

8 PAPERS • 1 BENCHMARK

TrashCan

The TrashCan dataset is an instance-segmentation dataset of underwater trash. It is comprised of annotated images (7,212 images) which contain observations of trash, ROVs, and a wide variety of undersea flora and fauna. The annotations in this dataset take the format of instance segmentation annotations: bitmaps containing a mask marking which pixels in the image contain each object. The imagery in TrashCan is sourced from the J-EDI (JAMSTEC E-Library of Deep-sea Images) dataset, curated by the Japan Agency of Marine Earth Science and Technology (JAMSTEC).

8 PAPERS • NO BENCHMARKS YET

WaterScenes

A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset

8 PAPERS • 2 BENCHMARKS

Bamboo

Bamboo Dataset is a mega-scale and information-dense dataset for both classification and detection pre-training. It is built upon integrating 24 public datasets (e.g. ImagenNet, Places365, Object365, OpenImages) and added new annotations through active learning. Bamboo has 69M image classification annotations and 32M object bounding boxes.

7 PAPERS • NO BENCHMARKS YET

BigDetection

BigDetection is a new large-scale benchmark to build more general and powerful object detection systems. It leverages the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes.

7 PAPERS • 1 BENCHMARK

Cops-Ref

Cops-Ref is a dataset for visual reasoning in context of referring expression comprehension with two main features.

7 PAPERS • NO BENCHMARKS YET

PFN-PIC (PFN Picking Instructions for Commodities Dataset)

This dataset is a collection of spoken language instructions for a robotic system to pick and place common objects. Text instructions and corresponding object images are provided. The dataset consists of situations where the robot is instructed by the operator to pick up a specific object and move it to another location: for example, Move the blue and white tissue box to the top right bin. This dataset consists of RGBD images, bounding box annotations, destination box annotations, and text instructions.

7 PAPERS • NO BENCHMARKS YET

SpaceNet 2 (SpaceNet 2: Building Detection v2)

SpaceNet 2: Building Detection v2 - is a dataset for building footprint detection in geographically diverse settings from very high resolution satellite images. It contains over 302,701 building footprints, 3/8-band Worldview-3 satellite imagery at 0.3m pixel res., across 5 cities (Rio de Janeiro, Las Vegas, Paris, Shanghai, Khartoum), and covers areas that are both urban and suburban in nature. The dataset was split using 60%/20%/20% for train/test/validation.

7 PAPERS • 1 BENCHMARK

TTPLA (Transmission Towers and Power Lines (TTPLA))

TTPLA is a public dataset which is a collection of aerial images on Transmission Towers (TTs) and Power Lines (PLs). It can be used for detection and segmentation of transmission towers and power lines. It consists of 1,100 images with the resolution of 3,840×2,160 pixels, as well as manually labelled 8,987 instances of TTs and PLs.

7 PAPERS • NO BENCHMARKS YET

APRICOT

APRICOT is a collection of over 1,000 annotated photographs of printed adversarial patches in public locations. The patches target several object categories for three COCO-trained detection models, and the photos represent natural variation in position, distance, lighting conditions, and viewing angle.

6 PAPERS • NO BENCHMARKS YET

BAAI-VANJEE

BAAI-VANJEE is a dataset for benchmarking and training various computer vision tasks such as 2D/3D object detection and multi-sensor fusion. The BAAI-VANJEE roadside dataset consists of LiDAR data and RGB images collected by VANJEE smart base station placed on the roadside about 4.5m high. This dataset contains 2500 frames of LiDAR data, 5000 frames of RGB images, including 20% collected at the same time. It also contains 12 classes of objects, 74K 3D object annotations and 105K 2D object annotations.

6 PAPERS • NO BENCHMARKS YET

EuroCity Persons

The EuroCity Persons dataset provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211,200).

6 PAPERS • NO BENCHMARKS YET

FAT (Falling Things)

Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.

6 PAPERS • NO BENCHMARKS YET

Freiburg Groceries

Freiburg Groceries is a groceries classification dataset consisting of 5000 images of size 256x256, divided into 25 categories. It has imbalanced class sizes ranging from 97 to 370 images per class. Images were taken in various aspect ratios and padded to squares.

6 PAPERS • NO BENCHMARKS YET

IIIT-AR-13K

IIIT-AR-13K is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection.

6 PAPERS • NO BENCHMARKS YET

Lytro Illum

Lytro Illum is a new light field dataset using a Lytro Illum camera. 640 light fields are collected with significant variations in terms of size, textureness, background clutter and illumination, etc. Micro-lens image arrays and central viewing images are generated, and corresponding ground-truth maps are produced.

6 PAPERS • NO BENCHMARKS YET

MobilityAids

MobilityAids is a dataset for perception of people and their mobility aids. The annotated dataset contains five classes: pedestrian, person in wheelchair, pedestrian pushing a person in a wheelchair, person using crutches and person using a walking frame. In total the hospital dataset has over 17, 000 annotated RGB-D images, containing people categorized according to the mobility aids they use. The images were collected in the facilities of the Faculty of Engineering of the University of Freiburg and in a hospital in Frankfurt.

6 PAPERS • NO BENCHMARKS YET

PS-Battles

The PS-Battles dataset is gathered from a large community of image manipulation enthusiasts and provides a basis for media derivation and manipulation detection in the visual domain. The dataset consists of 102'028 images grouped into 11'142 subsets, each containing the original image as well as a varying number of manipulated derivatives.

6 PAPERS • NO BENCHMARKS YET

CBC (Complete Blood Count)

The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.

5 PAPERS • NO BENCHMARKS YET

DUO (Detecting Underwater Objects)

DUO is a dataset for Underwater object detection for robot picking. The dataset contains a collection of diverse underwater images with more rational annotations.

5 PAPERS • NO BENCHMARKS YET

Duke Breast Cancer MRI (Dynamic contrast-enhanced magnetic resonance images of breast cancer patients with tumor locations)

Breast MRI scans of 922 cancer patients from Duke University, with tumor bounding box annotations, clinical, imaging, and many other features, and more.

5 PAPERS • NO BENCHMARKS YET

HJDataset

HJDataset is a large dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types. In addition to bounding boxes and masks of the content regions, it also includes the hierarchical structures and reading orders for layout elements. The dataset is constructed using a combination of human and machine efforts.

5 PAPERS • NO BENCHMARKS YET

Kitchen Scenes

Kitchen Scenes is a multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud.

5 PAPERS • 1 BENCHMARK

Satlas

Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.

5 PAPERS • NO BENCHMARKS YET

Datasets

196 dataset results for Object Detection AND Images