1 code implementation • 14 Mar 2024 • Zhixuan Shen, Haonan Luo, Sijia Li, Tianrui Li
Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.
Optical Character Recognition Optical Character Recognition (OCR) +2
no code implementations • ICCV 2019 • Haonan Luo, Guosheng Lin, Zichuan Liu, Fayao Liu, Zhenmin Tang, Yazhou Yao
Then by the guide of extracted semantic features, a bottom-up visual attention mechanism is proposed for the Visual Question Answering (VQA) sub-task.