no code implementations • 1 Nov 2023 • Shi Yin, Shijie Huan, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, BaoCai Yin, Cong Liu
For temporal modeling, we propose a recurrent token mixing mechanism, an axis-landmark-positional embedding mechanism, as well as a confidence-enhanced multi-head attention mechanism to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group structure modeling mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers.