Speech Synthesis
291 papers with code • 4 benchmarks • 19 datasets
Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.
Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.
( Image credit: WaveNet: A generative model for raw audio )
Libraries
Use these libraries to find Speech Synthesis models and implementationsDatasets
Subtasks
- Expressive Speech Synthesis
- Emotional Speech Synthesis
- text-to-speech translation
- Speech Synthesis - Tamil
- Speech Synthesis - Tamil
- Speech Synthesis - Kannada
- Speech Synthesis - Malayalam
- Speech Synthesis - Telugu
- Speech Synthesis - Assamese
- Speech Synthesis - Bengali
- Speech Synthesis - Bodo
- Speech Synthesis - Gujarati
- Speech Synthesis - Hindi
- Speech Synthesis - Manipuri
- Speech Synthesis - Marathi
- Speech Synthesis - Rajasthani
Most implemented papers
Exploring Transfer Learning for Low Resource Emotional TTS
During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning.
MelNet: A Generative Model for Audio in the Frequency Domain
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks
In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.
Tools and resources for Romanian text-to-speech and speech-to-text applications
In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.
WaveFlow: A Compact Flow-based Model for Raw Audio
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
This paper presents fairseq S^2, a fairseq extension for speech synthesis.
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
Voice conversion is a common speech synthesis task which can be solved in different ways depending on a particular real-world scenario.
A Critical Review of Recurrent Neural Networks for Sequence Learning
Recurrent neural networks (RNNs) are connectionist models that capture the dynamics of sequences via cycles in the network of nodes.