Search Results for author: Heyang Liu

Found 4 papers, 2 papers with code

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations21 Mar 2024 Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

speech-recognition Speech Recognition +1

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

1 code implementation15 Jan 2024 Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

1 code implementation20 Aug 2023 Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.