Speech Synthesis

291 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,314

PaddlePaddle/PaddleSpeech

15 papers

10,154

TensorSpeech/TensorflowTTS

6 papers

3,701

keonlee9420/Expressive-FastSpeech2

4 papers

259

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Most implemented papers

Most implemented Social Latest No code

Exploring Transfer Learning for Low Resource Emotional TTS

Emotional-Text-to-Speech/dl-for-emo-tts • • Advances in Intelligent Systems and Computing 2019

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning.

Paper
Code

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet • • 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

Paper
Code

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

r9y9/gantts • • 23 Sep 2017

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

Paper
Code

Tools and resources for Romanian text-to-speech and speech-to-text applications

racai-ai/TEPROLIN • 15 Feb 2018

In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.

Paper
Code

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

PaddlePaddle/DeepSpeech • • 9 Jul 2019

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Paper
Code