The large-scale MUSIC-AVQA dataset of musical performance contains 45,867 question-answer pairs, distributed in 9,288 videos for over 150 hours. All QA pairs types are divided into 3 modal scenarios, which contain 9 question types and 33 question templates. Finally, as an open-ended problem of our AVQA tasks, all 42 kinds of answers constitute a set for selection.
21 PAPERS • 1 BENCHMARK
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
3 PAPERS • NO BENCHMARKS YET
The VideoNavQA dataset contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the Embodied Question Answering (EQA) task.
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
1 PAPER • NO BENCHMARKS YET