Search Results for author: Qilang Ye

Found 2 papers, 2 papers with code

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues

1 code implementation • 11 Mar 2024 • Qilang Ye, Zitong Yu, Xin Liu

Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +3

Paper
Code

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation • 7 Mar 2024 • Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao

This paper focuses on the challenge of answering questions in scenarios that are composed of rich and complex dynamic audio-visual components.

Ranked #4 on Video-based Generative Performance Benchmarking on VideoInstruct

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +5

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.