no code implementations • 8 Sep 2023 • Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, Yi Xu
To achieve this, we introduce a pre-trained LLM to generate context descriptions, and we encourage the prompts to learn from the LLM's knowledge by alignment, as well as the alignment between prompts and local image features.
no code implementations • 5 Sep 2023 • Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun
Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM).
1 code implementation • 26 May 2022 • Minghao Xu, Yuanfan Guo, Xuanyu Zhu, Jiawen Li, Zhenbang Sun, Jian Tang, Yi Xu, Bingbing Ni
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
2 code implementations • CVPR 2022 • Yuanfan Guo, Minghao Xu, Jiawen Li, Bingbing Ni, Xuanyu Zhu, Zhenbang Sun, Yi Xu
In this framework, a set of hierarchical prototypes are constructed and also dynamically updated to represent the hierarchical semantic structures underlying the data in the latent space.
1 code implementation • ICCV 2021 • Minghao Xu, Hang Wang, Bingbing Ni, Riheng Zhu, Zhenbang Sun, Changhu Wang
For tackling such practical problem, we propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.