no code implementations • 1 Nov 2023 • Shi Yin, Shijie Huan, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, BaoCai Yin, Cong Liu
For temporal modeling, we propose a recurrent token mixing mechanism, an axis-landmark-positional embedding mechanism, as well as a confidence-enhanced multi-head attention mechanism to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group structure modeling mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers.
no code implementations • 4 Aug 2023 • Yin Lin, Cong Liu, Yehansen Chen, Jinshui Hu, Bing Yin, BaoCai Yin, Zengfu Wang
Recently, visual-language learning has shown great potential in enhancing visual-based person re-identification (ReID).
1 code implementation • CVPR 2023 • Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang
LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.
no code implementations • 2 Sep 2022 • Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, LiRong Dai
However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings.