Search Results for author: Jianbing Zhang

Found 14 papers, 9 papers with code

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

1 code implementation • 15 Apr 2024 • Yaohui Li, Qifeng Zhou, Haoxing Chen, Jianbing Zhang, Xinyu Dai, Hao Zhou

Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'.

Few-Shot Learning Zero-Shot Learning

Paper
Code

MixRED: A Mix-lingual Relation Extraction Dataset

no code implementations • 23 Mar 2024 • Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

Relation extraction is a critical task in the field of natural language processing with numerous real-world applications.

Relation Relation Extraction

Paper
Add Code

Cobra Effect in Reference-Free Image Captioning Metrics

no code implementations • 18 Feb 2024 • Zheng Ma, Changxin Wang, Yawen Ouyang, Fei Zhao, Jianbing Zhang, ShuJian Huang, Jiajun Chen

If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences.

Image Captioning

Paper
Add Code

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

no code implementations • 15 Feb 2024 • Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, WeiHao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai

Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination.

Hallucination

Paper
Add Code

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

1 code implementation • 17 Jan 2024 • Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

In our preliminary study, we have discovered a key challenge in developing visual GUI agents: GUI grounding -- the capacity to accurately locate screen elements based on instructions.

105

Paper
Code

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

1 code implementation • 23 Oct 2023 • Fei Zhao, Chunhui Li, Zhen Wu, Yawen Ouyang, Jianbing Zhang, Xinyu Dai

Therefore, in this work, we focus on whether the negative impact of noisy images can be reduced without modifying the data.

Aspect-Based Sentiment Analysis Denoising +1

Paper
Code

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

1 code implementation • 15 Oct 2023 • Zheng Ma, Changxin Wang, Bo Huang, Zixuan Zhu, Jianbing Zhang

Several models adopted a non-autoregressive manner to speed up the process.

Image Captioning

Paper
Code

DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking

1 code implementation • 9 Oct 2023 • Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai

Multimodal Entity Linking (MEL) is a task that aims to link ambiguous mentions within multimodal contexts to referential entities in a multimodal knowledge base.

Entity Linking Relation

Paper
Code

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

1 code implementation • 6 Aug 2023 • Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jianbing Zhang, ShuJian Huang, Jiajun Chen

Experiments on our proposed datasets demonstrate that popular VLMs underperform in the food domain compared with their performance in the general domain.

Paper
Code

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

1 code implementation • 2 Aug 2023 • Kanzhi Cheng, Zheng Ma, Shi Zong, Jianbing Zhang, Xinyu Dai, Jiajun Chen

Generating visually grounded image captions with specific linguistic styles using unpaired stylistic corpora is a challenging task, especially since we expect stylized captions with a wide variety of stylistic patterns.

Contrastive Learning Image Captioning

Paper
Code

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

1 code implementation • 2 Aug 2023 • Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang

Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions.

Hallucination Image Captioning +2

Paper
Code

Probing Cross-modal Semantics Alignment Capability from the Textual Perspective

no code implementations • 18 Oct 2022 • Zheng Ma, Shi Zong, Mianzhi Pan, Jianbing Zhang, ShuJian Huang, Xinyu Dai, Jiajun Chen

In recent years, vision and language pre-training (VLP) models have advanced the state-of-the-art results in a variety of cross-modal downstream tasks.

Image Captioning Sentence

Paper
Add Code

Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

no code implementations • 2 Oct 2022 • Zhihuan Kuang, Shi Zong, Jianbing Zhang, Jiajun Chen, Hongfu Liu

In this paper, we consider a novel research problem: music-to-text synaesthesia.

Descriptive Music Tagging

Paper
Add Code

Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

1 code implementation • ACL 2019 • Peng Wu, Shu-Jian Huang, Rongxiang Weng, Zaixiang Zheng, Jianbing Zhang, Xiaohui Yan, Jia-Jun Chen

However, one critical problem is that current approaches only get high accuracy for questions whose relations have been seen in the training data.

Knowledge Base Question Answering Relation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.