1 code implementation • 4 Apr 2024 • Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang
In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.
1 code implementation • NeurIPS 2023 • Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang
By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.
no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.
no code implementations • 21 Jul 2022 • Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang
The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.
1 code implementation • Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) 2021 • Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, Yizhi Liu, Flora D Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang
Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports.