1 code implementation • 31 Jan 2024 • Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo
We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features.
1 code implementation • 1 Nov 2021 • Weixin Xu, Zipeng Feng, Shuangkang Fang, Song Yuan, Yi Yang, Shuchang Zhou
For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy.