Speech synthesis is the task of generating speech from text.
Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.
( Image credit: WaveNet: A generative model for raw audio )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Clone a voice in 5 seconds to generate arbitrary speech in real-time
SOTA for Text-To-Speech Synthesis on LJSpeech (using extra training data)
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
#4 best model for Speech Synthesis on North American English
In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.
We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.