Search Results for author: Heyang Liu

Found 4 papers, 2 papers with code

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations • 21 Mar 2024 • Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

speech-recognition Speech Recognition +1

Paper
Add Code

Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

no code implementations • 1 Mar 2024 • Heyang Liu, Yu Wang, Yanfeng Wang

End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

1 code implementation • 15 Jan 2024 • Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.

Paper
Code

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.