no code implementations • 5 Jun 2023 • Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level.
no code implementations • 4 Apr 2023 • Yongxin Zhu, Zhen Liu, Yukang Liang, Xin Li, Hao liu, Changcun Bao, Linli Xu
Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them.