no code implementations • 3 Mar 2024 • Huixuan Zhang, Junzhe Zhang, Xiaojun Wan
Large-scale vision-language models have demonstrated impressive skill in handling tasks that involve both areas.
no code implementations • 29 Feb 2024 • Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Xiaojun Wan
News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article.
1 code implementation • 1 Jul 2023 • Huixuan Zhang, Xiaojun Wan
We create a multimodal detection dataset from Weibo (a Chinese social media) and carry out some studies on it.