no code implementations • 7 Dec 2022 • Daxin Tan, Nikos Kargas, David McHardy, Constantinos Papayiannis, Antonio Bonafonte, Marek Strelec, Jonas Rohnke, Agis Oikonomou Filandras, Trevor Wood
Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations.
no code implementations • 12 Apr 2022 • Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee
This study propose a fully automated system for speech correction and accent reduction.
no code implementations • 31 Mar 2022 • Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao
However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.
no code implementations • 8 Oct 2021 • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee
However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.
no code implementations • 8 Oct 2021 • Daxin Tan, Guangyan Zhang, Tan Lee
The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.
no code implementations • 5 Aug 2021 • Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee
This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.
1 code implementation • 4 Jul 2021 • Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee
This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness.
no code implementations • 8 Mar 2021 • Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee
100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.
no code implementations • 8 Nov 2020 • Daxin Tan, Tan Lee
By incorporating a style predictor, the proposed system can also be used for text-to-speech synthesis.