Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-To-Speech Synthesis

Dataset	Best Model	Compare
LJSpeech	NaturalSpeech	See all
CMUDict 0.7b	Token-Level Ensemble Distillation	See all
20000 utterances	Mia	See all
HUI speech corpus	Tacotron 2	See all
Thorsten voice 21.02 neutral	Tacotron 2	See all
Trinity Speech-Gesture Dataset	Match-TTSG	See all

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

PaddlePaddle/PaddleSpeech

12 papers

10,118

coqui-ai/TTS

10 papers

29,084

keonlee9420/Expressive-FastSpeech2

5 papers

258

TensorSpeech/TensorflowTTS

4 papers

3,697

See all 12 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

coqui-ai/TTS • • ICLR 2021

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Paper
Code

Tacotron: Towards End-to-End Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning • • 29 Mar 2017

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Paper
Code

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

coqui-ai/TTS • • 24 Oct 2017

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units.

Paper
Code

FastSpeech: Fast, Robust and Controllable Text to Speech

coqui-ai/TTS • • NeurIPS 2019

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Paper
Code

Efficient Neural Audio Synthesis

CorentinJ/Real-Time-Voice-Cloning • • ICML 2018

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

Paper
Code

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

coqui-ai/TTS • • 25 Oct 2019

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Paper
Code

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

PaddlePaddle/PaddleSpeech • • ICML 2018

In this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system.

Paper
Code

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning • • NeurIPS 2018

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Paper
Code

FastSpeech: Fast,Robustand Controllable Text-to-Speech

PaddlePaddle/PaddleSpeech • • 22 May 2019

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Paper
Code

WaveGrad: Estimating Gradients for Waveform Generation

coqui-ai/TTS • • ICLR 2021

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

Paper
Code

Text-To-Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result