Browse > Speech > Speech Synthesis

Speech Synthesis

27 papers with code · Speech

Speech synthesis is the task of generating speech from text.

Please note that the state-of-the-art tables here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

State-of-the-art leaderboards

Greatest papers with code

Efficient Neural Audio Synthesis

ICML 2018 CorentinJ/Real-Time-Voice-Cloning

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 Dec 2017CorentinJ/Real-Time-Voice-Cloning

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

SPEECH SYNTHESIS

WaveNet: A Generative Model for Raw Audio

12 Sep 2016maciejkula/spotlight

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

AUDIO GENERATION SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017keithito/tacotron

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 Oct 2018NVIDIA/waveglow

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

SPEECH SYNTHESIS

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

ICLR 2018 r9y9/deepvoice3_pytorch

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.

SPEECH SYNTHESIS

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

25 May 2018NVIDIA/OpenSeq2Seq

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

MACHINE TRANSLATION SPEECH RECOGNITION SPEECH SYNTHESIS

Deep Voice: Real-time Neural Text-to-Speech

ICML 2017 NVIDIA/nv-wavenet

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks.

BOUNDARY DETECTION FEATURE ENGINEERING SPEECH SYNTHESIS

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

28 Mar 2019mozilla/LPCNet

We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.

SPEECH SYNTHESIS