Visual Genome contains Visual Question Answering data in a multi-choice setting. It consists of 101,174 images from MSCOCO with 1.7 million QA pairs, 17 questions per image on average. Compared to the Visual Question Answering dataset, Visual Genome represents a more balanced distribution over 6 question types: What, Where, When, Who, Why and How. The Visual Genome dataset also presents 108K images with densely annotated objects, attributes and relationships.
1,137 PAPERS • 19 BENCHMARKS
The ImageCLEF-DA dataset is a benchmark dataset for ImageCLEF 2014 domain adaptation challenge, which contains three domains: Caltech-256 (C), ImageNet ILSVRC 2012 (I) and Pascal VOC 2012 (P). For each domain, there are 12 categories and 50 images in each category.
91 PAPERS • 5 BENCHMARKS
The dataset concerns toy tasks that a human should teach to a robot. The number of task repetitions is limited in the dataset since the human should demonstrate the task to the robot only a few times.
1 PAPER • NO BENCHMARKS YET