Speech synthesis is the task of generating speech from text.
Please note that the state-of-the-art tables here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
Clone a voice in 5 seconds to generate arbitrary speech in real-time
SOTA for Text-To-Speech Synthesis on LJSpeech (using extra training data)
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
#4 best model for Speech Synthesis on North American English
In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.
We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.
We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.