Search Results for author: Yana Wei

Found 2 papers, 0 papers with code

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

Grounded Image Text Matching with Mismatched Relation Reasoning

no code implementations ICCV 2023 Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.

Image-text matching Relation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.