Text-To-Speech Synthesis

93 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-To-Speech Synthesis

Dataset	Best Model	Compare
LJSpeech	NaturalSpeech	See all
CMUDict 0.7b	Token-Level Ensemble Distillation	See all
20000 utterances	Mia	See all
HUI speech corpus	Tacotron 2	See all
Thorsten voice 21.02 neutral	Tacotron 2	See all
Trinity Speech-Gesture Dataset	Match-TTSG	See all

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

PaddlePaddle/PaddleSpeech

12 papers

10,177

coqui-ai/TTS

10 papers

29,529

keonlee9420/Expressive-FastSpeech2

5 papers

259

TensorSpeech/TensorflowTTS

4 papers

3,709

See all 12 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

MoonInTheRiver/DiffSinger • • 6 May 2021

Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive singing voice, in which the acoustic model generates the acoustic features (e. g., mel-spectrogram) given a music score.

Paper
Code

Neural Speech Synthesis with Transformer Network

PaddlePaddle/PaddleSpeech • • 19 Sep 2018

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs).

Paper
Code

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

huawei-noah/Speech-Backbones • • 13 May 2021

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes.

Paper
Code

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

microsoft/unilm • • 5 Jan 2023

In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.

Paper
Code

Exploring Transfer Learning for Low Resource Emotional TTS

Emotional-Text-to-Speech/dl-for-emo-tts • • Advances in Intelligent Systems and Computing 2019

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning.

Paper
Code

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet • • 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

Paper
Code

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

jaywalnut310/glow-tts • • NeurIPS 2020

By leveraging the properties of flows, MAS searches for the most probable monotonic alignment between text and the latent representation of speech.

Paper
Code

Tools and resources for Romanian text-to-speech and speech-to-text applications

racai-ai/TEPROLIN • 15 Feb 2018

In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.

Paper
Code

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

NVIDIA/flowtron • • ICLR 2021

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

Paper
Code

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

maum-ai/wavegrad2 • • 17 Jun 2021

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Paper
Code

Text-To-Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result