Speech Synthesis

290 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Libraries

Use these libraries to find Speech Synthesis models and implementations

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

declare-lab/hypertts 6 Apr 2024

In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic.

28
06 Apr 2024

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

is2ai/kazemotts 1 Apr 2024

This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.

13
01 Apr 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

xiangli2022/cm-tts 31 Mar 2024

The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.

36
31 Mar 2024

Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

rohan-chaudhury/humane-speech-synthesis-through-zero-shot-emotion-and-disfluency-generation 31 Mar 2024

Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions.

0
31 Mar 2024

Towards Decoding Brain Activity During Passive Listening of Speech

milaniusz/speech2brain2speech 26 Feb 2024

The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech.

3
26 Feb 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

walker-hyf/ecss 19 Dec 2023

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.

37
19 Dec 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

14
15 Dec 2023

Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism

g-milis/NEUTART 11 Dec 2023

Our method, which we call NEUral Text to ARticulate Talk (NEUTART), is a talking face generator that uses a joint audiovisual feature space, as well as speech-informed 3D facial reconstructions and a lip-reading loss for visual supervision.

22
11 Dec 2023

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

ETZET/SpeechEmotionAVLearning 24 Nov 2023

In this work, we propose to learn the AV representation from categorical emotion labels of speech.

4
24 Nov 2023

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,070
21 Nov 2023