Search Results for author: Qunbo Wang

Found 3 papers, 1 papers with code

Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA

no code implementations22 Apr 2024 Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu

Knowledge-based Visual Question Answering (VQA) requires models to incorporate external knowledge to respond to questions about visual content.

Language Modelling Large Language Model +2

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

1 code implementation NeurIPS 2023 Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).

 Ranked #1 on Image Captioning on COCO Captions (SPICE metric, using extra training data)

Audio captioning Audio-Visual Captioning +14

Cannot find the paper you are looking for? You can Submit a new open access paper.