The NUS-WIDE dataset contains 269,648 images with a total of 5,018 tags collected from Flickr. These images are manually annotated with 81 concepts, including objects and scenes.
320 PAPERS • 3 BENCHMARKS
Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images) are provided. The images often show complex scenes with several objects (8 annotated objects per image on average). Visual relationships between them are annotated, which support visual relationship detection, an emerging task that requires structured reasoning.
32 PAPERS • 1 BENCHMARK