The Talk2Car dataset finds itself at the intersection of various research domains, promoting the development of cross-disciplinary solutions for improving the state-of-the-art in grounding natural language into visual space. The annotations were gathered with the following aspects in mind: Free-form high quality natural language commands, that stimulate the development of solutions that can operate in the wild. A realistic task setting. Specifically, the authors consider an autonomous driving setting, where a passenger can control the actions of an Autonomous Vehicle by giving commands in natural language. The Talk2Car dataset was build on top of the nuScenes dataset to include an extensive suite of sensor modalities, i.e. semantic maps, GPS, LIDAR, RADAR and 360-degree RGB images annotated with 3D bounding boxes. Such variety of input modalities sets the object referral task on the Talk2Car dataset apart from related challenges, where additional sensor modalities are generally missing
33 PAPERS • 1 BENCHMARK
RadarScenes is a real-world radar point cloud dataset for automotive applications.
22 PAPERS • NO BENCHMARKS YET
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.
17 PAPERS • 14 BENCHMARKS
DADA-2000 is a large-scale benchmark with 2000 video sequences (named as DADA-2000) is contributed with laborious annotation for driver attention (fixation, saccade, focusing time), accident objects/intervals, as well as the accident categories, and superior performance to state-of-the-arts are provided by thorough evaluations.
13 PAPERS • NO BENCHMARKS YET
The EuroCity Persons dataset provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211,200).
6 PAPERS • NO BENCHMARKS YET
An annotated dataset is released to enable dynamic scene classification that includes 80 hours of diverse high quality driving video data clips collected in the San Francisco Bay area. The dataset includes temporal annotations for road places, road types, weather, and road surface conditions.
4 PAPERS • NO BENCHMARKS YET
EyeCar is a dataset of driving videos of vehicles involved in rear-end collisions paired with eye fixation data captured from human subjects. It contains 21 front-view videos that were captured in various traffic, weather, and day light conditions. Each video is 30sec in length and contains typical driving tasks (e.g., lanekeeping, merging-in, and braking) ending to rear-end collisions.
2 PAPERS • NO BENCHMARKS YET
The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.
2 PAPERS • 1 BENCHMARK
TuSimple Lane is an extension of the TuSimple dataset with 14,336 lane boundaries annotations. Each lane boundary in the dataset is annotated using 7 different classes such as “Single Dashed”, “Double Dashed” or “Single White Continuous”.
Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.
1 PAPER • NO BENCHMARKS YET