A GQA-based dataset with 1,040,830 multi-modal explanations of visual reasoning processes.
7 PAPERS • 1 BENCHMARK