no code implementations • 18 Apr 2024 • Han Fang, Xianghao Zang, Chao Ban, Zerun Feng, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun
Text-video retrieval aims to find the most relevant cross-modal samples for a given query.
1 code implementation • 26 May 2023 • Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Yanjun Wang
Given a query caption, the goal is to rank candidate images by relevance, from large to small.
no code implementations • 1 Mar 2023 • Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Zhongtian Du
To alleviate the gradient vanishing problem, we propose a Selectively Hard Negative Mining (SelHN) strategy, which chooses whether to mine hard negative samples according to the gradient vanishing condition.
no code implementations • 20 Oct 2022 • Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Ying Jin, Yufeng Zhang
Such a binary indicator covers only a limited subset of image-text semantic relations, which is insufficient to represent relevance degrees between images and texts described by continuous labels such as image captions.
no code implementations • 28 Sep 2022 • Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Jenq-Neng Hwang, Zhongtian Du
More specifically, Triplet loss with Hard Negative mining (Triplet-HN), which is widely used in existing retrieval models to improve the discriminative ability, is easy to fall into local minima in training.
no code implementations • 16 Jun 2020 • Zerun Feng, Zhimin Zeng, Caili Guo, Zheng Li
Finally, the region features are aggregated to form frame-level features for further encoding to measure video-text similarity.