Speech Synthesis

294 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,826

PaddlePaddle/PaddleSpeech

15 papers

10,244

TensorSpeech/TensorflowTTS

6 papers

3,715

keonlee9420/Expressive-FastSpeech2

4 papers

259

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Most implemented papers

Most implemented Social Latest No code

Deep Voice: Real-time Neural Text-to-Speech

NVIDIA/nv-wavenet • • ICML 2017

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks.

Paper
Code

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

NVIDIA/OpenSeq2Seq • • 25 May 2018

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Paper
Code

High Fidelity Speech Synthesis with Adversarial Networks

mbinkowski/DeepSpeechDistances • • ICLR 2020

However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech.

Paper
Code

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

coqui-ai/TTS • • 23 Oct 2019

Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text.

Paper
Code

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

NVIDIA/flowtron • • ICLR 2021

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

Paper
Code

SpeedySpeech: Efficient Neural Speech Synthesis

janvainer/speedyspeech • • 9 Aug 2020

While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time.

Paper
Code

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

maum-ai/wavegrad2 • • 17 Jun 2021

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Paper
Code

One TTS Alignment To Rule Them All

coqui-ai/TTS • • 23 Aug 2021

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Paper
Code

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

microsoft/speecht5 • • ACL 2022

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Paper
Code

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

coqui-ai/TTS • • 4 Dec 2021

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.

Paper
Code

Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result