ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian

WiNLP2022 2022 · Shaghayegh Mobasher, Ghazal Zamaninejad, Maryam Hashemi, Melika Nobakhtian, Sauleh Eetemadi ·

Despite recent advances in vision-and-language tasks, most progress is still focused on resource-rich languages such as English. Furthermore, widespread vision-and-language datasets directly adopt images representative of American or European cultures resulting in bias. Hence we introduce ParsVQA-Caps, the first benchmark in Persian for Visual Question Answering and Image Captioning tasks. We utilize two ways to collect datasets for each task, human-based and template-based for VQA and human-based and web-based for image captioning.

PDF