no code implementations • 16 Jan 2024 • Haobin Tang, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.
no code implementations • 14 Mar 2023 • xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.
no code implementations • 14 Mar 2023 • Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.
no code implementations • 28 May 2022 • Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao
In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.