Search Results for author: Duokang Wang

Found 2 papers, 0 papers with code

VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools

no code implementations16 Oct 2023 Ji Qi, Kaixuan Ji, Jifan Yu, Duokang Wang, Bin Xu, Lei Hou, Juanzi Li

Building models that comprehends videos and responds specific user instructions is a practical and challenging topic, as it requires mastery of both vision understanding and knowledge reasoning.

Caption Generation Descriptive +3

Cannot find the paper you are looking for? You can Submit a new open access paper.