🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

35 dataset results for Autonomous Driving AND Images

The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.

373 PAPERS • 12 BENCHMARKS

Virtual KITTI

Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.

120 PAPERS • 1 BENCHMARK

IDD (Indian Driving Dataset)

IDD is a dataset for road scene understanding in unstructured environments used for semantic segmentation and object detection for autonomous driving. It consists of 10,004 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads.

86 PAPERS • NO BENCHMARKS YET

ApolloScape

ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China under varying weather conditions. Pixel-wise semantic annotation of the recorded data is provided in 2D, with point-wise semantic annotation in 3D for 28 classes. In addition, the dataset contains lane marking annotations in 2D.

66 PAPERS • 5 BENCHMARKS

WoodScape

Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. WoodScape is an extensive fisheye automotive dataset named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images.

49 PAPERS • 1 BENCHMARK

KITTI Road

KITTI Road is road and lane estimation benchmark that consists of 289 training and 290 test images. It contains three different categories of road scenes: * uu - urban unmarked (98/100) * um - urban marked (95/96) * umm - urban multiple marked lanes (96/94) * urban - combination of the three above Ground truth has been generated by manual annotation of the images and is available for two different road terrain types: road - the road area, i.e, the composition of all lanes, and lane - the ego-lane, i.e., the lane the vehicle is currently driving on (only available for category "um"). Ground truth is provided for training images only.

37 PAPERS • NO BENCHMARKS YET

A*3D

The A*3D dataset is a step forward to make autonomous driving safer for pedestrians and the public in the real world. Characteristics: * 230K human-labeled 3D object annotations in 39,179 LiDAR point cloud frames and corresponding frontal-facing RGB images. * Captured at different times (day, night) and weathers (sun, cloud, rain).

34 PAPERS • NO BENCHMARKS YET

Talk2Car

The Talk2Car dataset finds itself at the intersection of various research domains, promoting the development of cross-disciplinary solutions for improving the state-of-the-art in grounding natural language into visual space. The annotations were gathered with the following aspects in mind: Free-form high quality natural language commands, that stimulate the development of solutions that can operate in the wild. A realistic task setting. Specifically, the authors consider an autonomous driving setting, where a passenger can control the actions of an Autonomous Vehicle by giving commands in natural language. The Talk2Car dataset was build on top of the nuScenes dataset to include an extensive suite of sensor modalities, i.e. semantic maps, GPS, LIDAR, RADAR and 360-degree RGB images annotated with 3D bounding boxes. Such variety of input modalities sets the object referral task on the Talk2Car dataset apart from related challenges, where additional sensor modalities are generally missing

34 PAPERS • 1 BENCHMARK

ApolloCar3D

ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.

17 PAPERS • 14 BENCHMARKS

KITTI-Depth

The KITTI-Depth dataset includes depth maps from projected LiDAR point clouds that were matched against the depth estimation from the stereo cameras. The depth images are highly sparse with only 5% of the pixels available and the rest is missing. The dataset has 86k training images, 7k validation images, and 1k test set images on the benchmark server with no access to the ground truth.

14 PAPERS • NO BENCHMARKS YET

4Seasons

4Seasons is adataset covering seasonal and challenging perceptual conditions for autonomous driving.

13 PAPERS • NO BENCHMARKS YET

SynWoodScape

SynWoodScape (Synthetic Surround-view Fisheye Camera Dataset for Autonomous Driving)

SynWoodScape is a synthetic version of the surround-view dataset covering many of its weaknesses and extending it. WoodScape comprises four surround-view cameras and nine tasks, including segmentation, depth estimation, 3D bounding box detection, and a novel soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for the fisheye camera instead of using naive rectification.

12 PAPERS • NO BENCHMARKS YET

SODA10M

SODA10M is a large-scale object detection benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data. SODA10M contains 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.

11 PAPERS • NO BENCHMARKS YET

BLVD

BLVD is a large scale 5D semantics dataset collected by the Visual Cognitive Computing and Intelligent Vehicles Lab. This dataset contains 654 high-resolution video clips owing 120k frames extracted from Changshu, Jiangsu Province, China, where the Intelligent Vehicle Proving Center of China (IVPCC) is located. The frame rate is 10fps/sec for RGB data and 3D point cloud. The dataset contains fully annotated frames which yield 249,129 3D annotations, 4,902 independent individuals for tracking with the length of overall 214,922 points, 6,004 valid fragments for 5D interactive event recognition, and 4,900 individuals for 5D intention prediction. These tasks are contained in four kinds of scenarios depending on the object density (low and high) and light conditions (daytime and nighttime).

9 PAPERS • NO BENCHMARKS YET

ONCE-3DLanes

ONCE-3DLanes (Monocular 3D Lane Detection Dataset)

ONCE-3DLanes is a real-world autonomous driving dataset with lane layout annotation in 3D space. A dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations by exploiting the explicit relationship between point clouds and image pixels in 211,000 road scenes.

9 PAPERS • NO BENCHMARKS YET

DAWN

DAWN emphasizes a diverse traffic environment (urban, highway and freeway) as well as a rich variety of traffic flow. The DAWN dataset comprises a collection of 1000 images from real-traffic environments, which are divided into four sets of weather conditions: fog, snow, rain and sandstorms. The dataset is annotated with object bounding boxes for autonomous driving and video surveillance scenarios. This data helps interpreting effects caused by the adverse weather conditions on the performance of vehicle detection systems.

7 PAPERS • NO BENCHMARKS YET

openDD

Annotated using images taken by a drone in 501 separate flights, totalling in over 62 hours of trajectory data. As of today, openDD is by far the largest publicly available trajectory dataset recorded from a drone perspective, while comparable datasets span 17 hours at most.

7 PAPERS • NO BENCHMARKS YET

DurLAR (A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery)

DurLAR is a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery for multi-modal autonomous driving applications. Compared to existing autonomous driving task datasets, DurLAR has the following novel features:

5 PAPERS • NO BENCHMARKS YET

PedX

PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.

5 PAPERS • NO BENCHMARKS YET

Brno-Urban-Dataset

This self-driving dataset collected in Brno, Czech Republic contains data from four WUXGA cameras, two 3D LiDARs, inertial measurement unit, infrared camera and especially differential RTK GNSS receiver with centimetre accuracy.

4 PAPERS • NO BENCHMARKS YET

CARLANE Benchmark

Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the eva

3 PAPERS • 3 BENCHMARKS

ELAS

ELAS is a dataset for lane detection. It contains more than 20 different scenes (in more than 15,000 frames) and considers a variety of scenarios (urban road, highways, traffic, shadows, etc.). The dataset was manually annotated for several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes).

3 PAPERS • NO BENCHMARKS YET

MUAD (Multiple Uncertainties for Autonomous Driving)

The MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 10,413 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects, and annotations for semantic segmentation, depth estimation, object, and instance detection. Predictive uncertainty estimation is essential for the safe deployment of Deep Neural Networks in real-world autonomous systems and MUAD allows to a better assess the impact of different sources of uncertainty on model performance.

3 PAPERS • NO BENCHMARKS YET

UNDD (Urban Night Driving Dataset)

UNDD consists of 7125 unlabelled day and night images; additionally, it has 75 night images with pixel-level annotations having classes equivalent to Cityscapes dataset.

3 PAPERS • NO BENCHMARKS YET

SODA-D

SODA-D is a large-scale dataset tailored for small object detection in driving scenario, which is built on top of MVD dataset and owned data, where the former is a dataset dedicated to pixel-level understanding of street scenes, and the latter is mainly captured by onboard cameras and mobile phones. With 24704 well-chosen and high-quality images of driving scenarios, SODA-D comprises 277596 instances of 9 categories with horizontal bounding boxes.

2 PAPERS • 1 BENCHMARK

StereoMSI

StereoMSI comprises of 350 registered colour-spectral image pairs. The dataset has been used for the two tracks of the PIRM2018 challenge.

2 PAPERS • NO BENCHMARKS YET

TCG (Traffic Control Gesture)

The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.

2 PAPERS • 1 BENCHMARK

Talk2Nav

Talk2Nav is a large-scale dataset with verbal navigation instructions.

2 PAPERS • NO BENCHMARKS YET

TuSimple Lane

TuSimple Lane is an extension of the TuSimple dataset with 14,336 lane boundaries annotations. Each lane boundary in the dataset is annotated using 7 different classes such as “Single Dashed”, “Double Dashed” or “Single White Continuous”.

2 PAPERS • NO BENCHMARKS YET

Apron Dataset

The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.

1 PAPER • NO BENCHMARKS YET

Autonomous-driving Streaming Perception Benchmarrk

The Autonomous-driving StreAming Perception (ASAP) benchmark is a benchmark to evaluate the online performance of vision-centric perception in autonomous driving. It extends the 2Hz annotated nuScenes dataset by generating high-frame-rate labels for the 12Hz raw images.

1 PAPER • NO BENCHMARKS YET

Panoramic Video Panoptic Segmentation Dataset

Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.

1 PAPER • NO BENCHMARKS YET

SDN (Situated Dialogue Navigation)

Situated Dialogue Navigation (SDN) is a navigation benchmark of 183 trials with a total of 8415 utterances, around 18.7 hours of control streams, and 2.9 hours of trimmed audio. SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.

1 PAPER • NO BENCHMARKS YET

TAS-NIR

TAS-NIR is a VIS+NIR dataset of semantically annotated images in unstructured outdoor environments. It consists of 209 VIS+NIR image pairs with a fine-grained semantic segmentation.

1 PAPER • NO BENCHMARKS YET

Humans in 3D

H3D (Humans in 3D) is a dataset of annotated people. The annotations include:

0 PAPER • NO BENCHMARKS YET

Datasets

35 dataset results for Autonomous Driving AND Images