Audio-visual Question Answering
12 papers with code • 1 benchmarks • 1 datasets
This task has no description! Would you like to contribute one?
Most implemented papers
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer.
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions.