Speech Synthesis

292 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,487

PaddlePaddle/PaddleSpeech

15 papers

10,175

TensorSpeech/TensorflowTTS

6 papers

3,707

keonlee9420/Expressive-FastSpeech2

4 papers

259

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Latest papers

Most implemented Social Latest No code

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp • • 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,077

21 Nov 2023

Paper
Code

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

redmist328/apnet2 • • 20 Nov 2023

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

20 Nov 2023

Paper
Code

ChatGPT in the context of precision agriculture data analytics

potamitis123/chatgpt-in-the-context-of-precision-agriculture-data-analytics • 10 Nov 2023

In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports.

10 Nov 2023

Paper
Code

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

c3imaging/child_tts_fastpitch • • 7 Nov 2023

The approach involved finetuning a multi-speaker TTS model to work with child speech.

07 Nov 2023

Paper
Code

ArTST: Arabic Text and Speech Transformer

mbzuai-nlp/artst • • 25 Oct 2023

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

25 Oct 2023

Paper
Code

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

ucla-trustworthy-ai-lab/autodiffusion • • 24 Oct 2023

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis.

24 Oct 2023

Paper
Code