no code implementations • 20 Feb 2023 • Litian Zhang, XiaoMing Zhang, Ziming Guo, Zhipeng Liu
Then, the visual description and text content are fused to generate the textual summary to capture the semantics of the multimodal content, and the most relevant image is selected as the visual summary.
no code implementations • 27 Dec 2022 • Huadeng Wang, Zhipeng Liu, Rushi Lan, Zhenbing Liu, Xiaonan Luo, Xipeng Pan, Bingbing Li
In addition, the model also achieves good performance on GZMH dataset, which is prepared by our group and will be firstly released with the publication of this paper.