Browse > Speech > Text-To-Speech Synthesis

Text-To-Speech Synthesis

11 papers with code · Speech

Leaderboards

Greatest papers with code

Efficient Neural Audio Synthesis

ICML 2018 CorentinJ/Real-Time-Voice-Cloning

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017keithito/tacotron

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

24 Oct 2017Kyubyong/tacotron

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units.

TEXT-TO-SPEECH SYNTHESIS

MelNet: A Generative Model for Audio in the Frequency Domain

4 Jun 2019fatchord/MelNet

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

AUDIO GENERATION MUSIC GENERATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 Oct 2019kan-bayashi/ParallelWaveGAN

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Parallel Neural Text-to-Speech

ICLR 2020 ksw0306/WaveVAE

In this work, we propose a non-autoregressive seq2seq model that converts text to spectrogram.

TEXT-TO-SPEECH SYNTHESIS

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

29 Oct 2018nii-yamagishilab/self-attention-tacotron

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

25 Jun 2018numediart/EmoV-DB

In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose.

SPEECH EMOTION RECOGNITION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS