1 code implementation • 30 Nov 2023 • Tongjia Chen, Hongshan Yu, Zhengeng Yang, Zechuan Li, Wei Sun, Chen Chen
Due to the resource-intensive nature of training vision-language models on expansive video data, a majority of studies have centered on adapting pre-trained image-language models to the video domain.
Ranked #2 on Zero-Shot Action Recognition on Kinetics
1 code implementation • 23 Jun 2023 • Tom Tongjia Chen, Hongshan Yu, Zhengeng Yang, Ming Li, Zechuan Li, Jingwen Wang, Wei Miao, Wei Sun, Chen Chen
Affordance-Centric Question-driven Task Completion (AQTC) has been proposed to acquire knowledge from videos to furnish users with comprehensive and systematic instructions.
1 code implementation • CVPR 2023 • Zechuan Li, Hongshan Yu, Zhengeng Yang, Tongjia Chen, Naveed Akhtar
In this work, we propose AShapeFormer, a semantics-guided object-level shape encoding module for 3D object detection.