Speech Synthesis

290 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,183

PaddlePaddle/PaddleSpeech

15 papers

10,131

TensorSpeech/TensorflowTTS

6 papers

3,698

keonlee9420/Expressive-FastSpeech2

4 papers

259

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Latest papers

Most implemented Social Latest No code

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

declare-lab/hypertts • • 6 Apr 2024

In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic.

06 Apr 2024

Paper
Code

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

is2ai/kazemotts • • 1 Apr 2024

This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.

01 Apr 2024

Paper
Code

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

xiangli2022/cm-tts • • 31 Mar 2024

The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.

31 Mar 2024

Paper
Code

Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

rohan-chaudhury/humane-speech-synthesis-through-zero-shot-emotion-and-disfluency-generation • 31 Mar 2024

Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions.

31 Mar 2024

Paper
Code

Towards Decoding Brain Activity During Passive Listening of Speech

milaniusz/speech2brain2speech • • 26 Feb 2024

The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech.

26 Feb 2024

Paper
Code

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

walker-hyf/ecss • • 19 Dec 2023

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.

19 Dec 2023

Paper
Code

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification • • 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

15 Dec 2023

Paper
Code

Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism

g-milis/NEUTART • • 11 Dec 2023

Our method, which we call NEUral Text to ARticulate Talk (NEUTART), is a talking face generator that uses a joint audiovisual feature space, as well as speech-informed 3D facial reconstructions and a lip-reading loss for visual supervision.

11 Dec 2023

Paper
Code