no code implementations • 19 Apr 2024 • Zhaoxi Mu, Xinyu Yang
In audio-visual target speech extraction tasks, the audio modality tends to dominate, potentially overshadowing the importance of visual guidance.
no code implementations • 16 Dec 2023 • Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang
However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network.
no code implementations • 7 Mar 2023 • Zhaoxi Mu, Xinyu Yang, Wenjing Zhu
Specifically, we design a new network SE-Conformer that can model audio sequences in multiple dimensions and scales, and apply it to the dual-path speech separation framework.
no code implementations • 7 Mar 2023 • Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wenjing Zhu
In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations.
no code implementations • 20 Apr 2021 • Zhaoxi Mu, Xinyu Yang, Yizhuo Dong
As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output of intelligent machine more easily and intuitively, thus has attracted more and more attention.