TextVQA

Introduced by Singh et al. in Towards VQA Models That Can Read

TextVQA is a dataset to benchmark visual reasoning based on text in images. TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.

Statistics * 28,408 images from OpenImages * 45,336 questions * 453,360 ground truth answers

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages