Browse SoTA > Speech > Text-To-Speech Synthesis

Text-To-Speech Synthesis

18 papers with code · Speech

Benchmarks

Greatest papers with code

Efficient Neural Audio Synthesis

ICML 2018 CorentinJ/Real-Time-Voice-Cloning

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017mozilla/TTS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

24 Oct 2017Kyubyong/tacotron

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units.

TEXT-TO-SPEECH SYNTHESIS

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 Oct 2019TensorSpeech/TensorflowTTS

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

SPEECH SYNTHESIS TEST RESULTS TEXT-TO-SPEECH SYNTHESIS

FastSpeech: Fast,Robustand Controllable Text-to-Speech

22 May 2019TensorSpeech/TensorflowTTS

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

TEXT-TO-SPEECH SYNTHESIS

FastSpeech: Fast, Robust and Controllable Text to Speech

NeurIPS 2019 xcmyz/FastSpeech

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Neural Speech Synthesis with Transformer Network

19 Sep 2018as-ideas/TransformerTTS

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs).

MACHINE TRANSLATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

12 May 2020NVIDIA/flowtron

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric)

SPEECH SYNTHESIS STYLE TRANSFER TEXT-TO-SPEECH SYNTHESIS

MelNet: A Generative Model for Audio in the Frequency Domain

4 Jun 2019fatchord/MelNet

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

AUDIO GENERATION MUSIC GENERATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS