no code implementations • 6 May 2024 • Li Mi, Xianjie Dai, Javiera Castillo-Navarro, Devis Tuia
For this reason, as a matching-based task, cross-modal text-image retrieval often suffers from information asymmetry between texts and images.
no code implementations • 20 Mar 2024 • Li Mi, Chang Xu, Javiera Castillo-Navarro, Syrielle Montariol, Wen Yang, Antoine Bosselut, Devis Tuia
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view.
no code implementations • 20 Feb 2024 • Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuia
However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding.
no code implementations • CVPR 2022 • Yangjun Ou, Li Mi, Zhenzhong Chen
By combining an object-level graph (OG) and a relation-level graph (RG), the proposed OR2G catches the attribute transitions of objects and reasons about the relationship transitions between objects simultaneously.
no code implementations • 6 Jul 2021 • Leitian Tao, Li Mi, Nannan Li, Xianhang Cheng, Yaosi Hu, Zhenzhong Chen
For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes.
no code implementations • 2 Jul 2021 • Li Mi, Yangjun Ou, Zhenzhong Chen
To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video.
no code implementations • CVPR 2020 • Li Mi, Zhenzhong Chen
Object-level graph aims to capture the interactions between objects, while the triplet-level graph models the dependencies among relation triplets.