Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-To-Speech Synthesis

Dataset	Best Model	Compare
LJSpeech	NaturalSpeech	See all
CMUDict 0.7b	Token-Level Ensemble Distillation	See all
20000 utterances	Mia	See all
HUI speech corpus	Tacotron 2	See all
Thorsten voice 21.02 neutral	Tacotron 2	See all
Trinity Speech-Gesture Dataset	Match-TTSG	See all

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

PaddlePaddle/PaddleSpeech

12 papers

10,069

coqui-ai/TTS

10 papers

28,889

keonlee9420/Expressive-FastSpeech2

5 papers

258

TensorSpeech/TensorflowTTS

4 papers

3,686

See all 12 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

is2ai/kazemotts • • 1 Apr 2024

This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.

01 Apr 2024

Paper
Code

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

xiangli2022/cm-tts • • 31 Mar 2024

The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.

31 Mar 2024

Paper
Code

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

ETZET/SpeechEmotionAVLearning • 24 Nov 2023

In this work, we propose to learn the AV representation from categorical emotion labels of speech.

24 Nov 2023

Paper
Code

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

c3imaging/child_tts_fastpitch • • 7 Nov 2023

The approach involved finetuning a multi-speaker TTS model to work with child speech.

07 Nov 2023

Paper
Code

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors

QData/TextAttack • 25 Oct 2023

This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.

2,737

25 Oct 2023

Paper
Code

ArTST: Arabic Text and Speech Transformer

mbzuai-nlp/artst • • 25 Oct 2023

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

25 Oct 2023

Paper
Code

Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling

tiberiu44/TTS-Cube • • 14 Oct 2023

We describe an end-to-end speech synthesis system that uses generative adversarial training.

224

14 Oct 2023

Paper
Code

Attentive Multi-Layer Perceptron for Non-autoregressive Generation

shark-nlp/attentivemlp • • 14 Oct 2023

Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.

14 Oct 2023

Paper
Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

alibaba-damo-academy/funcodec • • 7 Oct 2023

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

272

07 Oct 2023

Paper
Code

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

alibaba-damo-academy/funcodec • • 14 Sep 2023

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

272

14 Sep 2023

Paper
Code

Text-To-Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result