no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie
Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.
no code implementations • 9 Mar 2023 • Kai Liu, Ziqing Du, Xucheng Wan, Huan Zhou
To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information.
no code implementations • 16 Jan 2023 • Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou
As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker.
no code implementations • 24 Sep 2022 • Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou
Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion.
no code implementations • 24 Sep 2022 • Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou
To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset.