no code implementations • 28 Sep 2023 • Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu
Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.
no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie
In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.
no code implementations • 8 Aug 2023 • Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao
Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.
no code implementations • 13 Jun 2023 • Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao
Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER).