Speech Synthesis

292 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Libraries

Use these libraries to find Speech Synthesis models and implementations

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,077
21 Nov 2023

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

redmist328/apnet2 20 Nov 2023

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

44
20 Nov 2023

ChatGPT in the context of precision agriculture data analytics

potamitis123/chatgpt-in-the-context-of-precision-agriculture-data-analytics 10 Nov 2023

In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports.

2
10 Nov 2023

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

c3imaging/child_tts_fastpitch 7 Nov 2023

The approach involved finetuning a multi-speaker TTS model to work with child speech.

3
07 Nov 2023

ArTST: Arabic Text and Speech Transformer

mbzuai-nlp/artst 25 Oct 2023

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

16
25 Oct 2023

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

ucla-trustworthy-ai-lab/autodiffusion 24 Oct 2023

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis.

2
24 Oct 2023

Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling

tiberiu44/TTS-Cube 14 Oct 2023

We describe an end-to-end speech synthesis system that uses generative adversarial training.

224
14 Oct 2023

Attentive Multi-Layer Perceptron for Non-autoregressive Generation

shark-nlp/attentivemlp 14 Oct 2023

Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.

2
14 Oct 2023

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

nii-yamagishilab/partial_rank_similarity 8 Oct 2023

That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss.

2
08 Oct 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

alibaba-damo-academy/funcodec 7 Oct 2023

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

280
07 Oct 2023