1 code implementation • 7 Apr 2024 • Liqiang Jing, Xinya Du
To handle these limitations, we propose an innovative method to align modalities in LVLMs through Fine-Grained Artificial Intelligence Feedback (FGAIF), which mainly consists of three steps: AI-based Feedback Collection, Fine-grained Reward Model Training, and Reinforcement Learning with Fine-grained Reward.
no code implementations • 18 Feb 2024 • Liqiang Jing, Jingxuan Zuo, Yue Zhang
To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i. e. reference-based factuality evaluation framework and reference-free factuality evaluation framework.
no code implementations • 6 Feb 2024 • Kun Ouyang, Liqiang Jing, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie
Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which are vital clues for sarcasm explanation.
no code implementations • 16 Dec 2023 • Mengzhao Jia, Can Xie, Liqiang Jing
Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization.
no code implementations • 15 Dec 2023 • Liqiang Jing, Xuemeng Song, Xinxing Zu, Na Zheng, Zhongzhou Zhao, Liqiang Nie
Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i. e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i. e. Gloss2Text).
1 code implementation • 2 Nov 2023 • Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du
We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs).
no code implementations • 11 Oct 2023 • Mengzhao Jia, Qianglong Chen, Liqiang Jing, Dawei Fu, Renyu Li
The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support.
1 code implementation • 20 Jul 2023 • Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie
To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias (i. e., the severer spurious correlations).
1 code implementation • 29 Jun 2023 • Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie
Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm.
no code implementations • 5 May 2023 • Liqiang Jing, Xuemeng Song, Xuming Lin, Zhongzhou Zhao, Wei Zhou, Liqiang Nie
This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples.
1 code implementation • 24 Jul 2022 • Teng Sun, Wenjie Wang, Liqiang Jing, Yiran Cui, Xuemeng Song, Liqiang Nie
Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis, which captures the direct effect of textual modality via an extra text model and estimates the indirect one by a multimodal model.
no code implementations • 16 Jul 2022 • Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, Liqiang Nie
To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation.