Browse SoTA > Speech > Speech Synthesis

Speech Synthesis

61 papers with code · Speech

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Greatest papers with code

Efficient Neural Audio Synthesis

ICML 2018 CorentinJ/Real-Time-Voice-Cloning

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017mozilla/TTS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

WaveNet: A Generative Model for Raw Audio

12 Sep 2016maciejkula/spotlight

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

AUDIO GENERATION SPEECH SYNTHESIS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 Dec 2017NVIDIA/tacotron2

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

SPEECH SYNTHESIS

WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 Oct 2018NVIDIA/waveglow

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

SPEECH SYNTHESIS

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

ICLR 2018 r9y9/deepvoice3_pytorch

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.

SPEECH SYNTHESIS

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

25 May 2018NVIDIA/OpenSeq2Seq

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

MACHINE TRANSLATION SPEECH RECOGNITION SPEECH SYNTHESIS

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 Jun 2020TensorSpeech/TensorflowTTS

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

SPEECH SYNTHESIS

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 Oct 2019TensorSpeech/TensorflowTTS

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

SPEECH SYNTHESIS TEST RESULTS TEXT-TO-SPEECH SYNTHESIS