Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

is2ai/kazemotts 1 Apr 2024

This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.

11
01 Apr 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

xiangli2022/cm-tts 31 Mar 2024

The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.

34
31 Mar 2024

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

ETZET/SpeechEmotionAVLearning 24 Nov 2023

In this work, we propose to learn the AV representation from categorical emotion labels of speech.

4
24 Nov 2023

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

c3imaging/child_tts_fastpitch 7 Nov 2023

The approach involved finetuning a multi-speaker TTS model to work with child speech.

2
07 Nov 2023

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors

QData/TextAttack 25 Oct 2023

This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.

2,737
25 Oct 2023

ArTST: Arabic Text and Speech Transformer

mbzuai-nlp/artst 25 Oct 2023

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

14
25 Oct 2023

Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling

tiberiu44/TTS-Cube 14 Oct 2023

We describe an end-to-end speech synthesis system that uses generative adversarial training.

224
14 Oct 2023

Attentive Multi-Layer Perceptron for Non-autoregressive Generation

shark-nlp/attentivemlp 14 Oct 2023

Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.

1
14 Oct 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

alibaba-damo-academy/funcodec 7 Oct 2023

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

272
07 Oct 2023

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

alibaba-damo-academy/funcodec 14 Sep 2023

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

272
14 Sep 2023