no code implementations • ICCV 2023 • Huaiwen Zhang, Zihang Guo, Yang Yang, Xin Liu, De Hu
In this paper, we propose a Cross-modal Contextualized Sequence Transduction (C2ST) for CSLR, which effectively incorporates the knowledge of gloss sequence into the process of video representation learning and sequence transduction.