no code implementations • 16 Feb 2024 • Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim
II-MMR then analyzes this path to identify different reasoning cases in current VQA benchmarks by estimating how many hops and what types (i. e., visual or beyond-visual) of reasoning are required to answer the question.
no code implementations • 6 Feb 2024 • Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites.
1 code implementation • 3 Jan 2024 • Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering.
no code implementations • ICCV 2023 • Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut
The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective.
1 code implementation • CVPR 2022 • Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su
We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task.
1 code implementation • EMNLP 2021 • Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao
We found that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly.
1 code implementation • NAACL 2021 • Jihyung Kil, Wei-Lun Chao
Zero-shot learning aims to recognize unseen objects using their semantic representations.