no code implementations • 26 Mar 2024 • Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji
To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.
no code implementations • 15 Apr 2023 • Yang Yang, Zhongtian Fu, Xiangyu Wu, Wenjie Li
To address this challenge, in this paper, we experimentally observe that the vision-language divergence may cause the existence of strong and weak modalities, and the hard cross-modal consistency cannot guarantee that strong modal instances' relationships are not affected by weak modality, resulting in the strong modal instances' relationships perturbed despite learned consistent representations. To this end, we propose a novel and directly Coordinated VisionLanguage Retrieval method (dubbed CoVLR), which aims to study and alleviate the desynchrony problem between the cross-modal alignment and single-modal cluster-preserving tasks.