Audio-Video Question Answering (AVQA)
1 papers with code • 0 benchmarks • 0 datasets
This task has no description! Would you like to contribute one?
Benchmarks
These leaderboards are used to track progress in Audio-Video Question Answering (AVQA)
No evaluation results yet. Help compare methods by
submitting
evaluation metrics.
Most implemented papers
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.