no code implementations • 2 Feb 2024 • Dan Lyth, Simon King
We propose a scalable method for labeling various aspects of speaker identity, style, and recording conditions.
no code implementations • 2 Jun 2023 • Alistair Carson, Cassia Valentini-Botinhao, Simon King, Stefan Bilbao
The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy.
no code implementations • 17 May 2023 • Atli Thor Sigurgeirsson, Simon King
Given only a natural language query text (the prompt), such models can be used to solve specific, context-dependent tasks.
no code implementations • 3 Apr 2023 • Tian Huey Teh, Vivian Hu, Devang S Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gomez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King
Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech.
no code implementations • 7 Mar 2023 • Atli Thor Sigurgeirsson, Simon King
This is done by using a learned embedding of the reference utterance, which is used to condition speech generation.
no code implementations • 15 Jun 2021 • Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King
Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity.
no code implementations • 15 Jun 2021 • Devang S Ram Mohan, Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text.
no code implementations • 7 Dec 2020 • Pilar Oplustil-Gallegos, Simon King
Many speech synthesis datasets, especially those derived from audiobooks, naturally comprise sequences of utterances.
1 code implementation • 14 Mar 2020 • Zack Hodari, Catherine Lai, Simon King
In English, prosody adds a broad range of information to segment sequences, from information structure (e. g. contrast) to stylistic variation (e. g. expression of emotion).
2 code implementations • 28 Feb 2020 • Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King
Our NN predicts MOS with a high correlation to human judgments.
1 code implementation • 10 Jun 2019 • Zack Hodari, Oliver Watts, Simon King
A generative model that can synthesise multiple prosodies will, by design, not model average prosody.
1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.
no code implementations • 22 Aug 2016 • Srikanth Ronanki, Oliver Watts, Simon King, Gustav Eje Henter
This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame).
no code implementations • 18 Aug 2016 • Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, Simon King
These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that.
no code implementations • 22 Feb 2016 • Zhizheng Wu, Simon King
We propose two novel techniques --- stacking bottleneck features and minimum generation error training criterion --- to improve the performance of deep neural network (DNN)-based speech synthesis.
no code implementations • 11 Jan 2016 • Zhizheng Wu, Simon King
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS).