WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

2 Feb 2020 Rui Liu Berrak Sisman Feilong Bao Guanglai Gao Haizhou Li

Tacotron-based text-to-speech (TTS) systems directly synthesize speech from text input. Such frameworks typically consist of a feature prediction network that maps character sequences to frequency-domain acoustic features, followed by a waveform reconstruction algorithm or a neural vocoder that generates the time-domain waveform from acoustic features... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Mixture of Logistic Distributions
Output Functions
Dilated Causal Convolution
Temporal Convolutions
WaveNet
Generative Audio Models
LSTM
Recurrent Neural Networks
Weight Decay
Regularization
Adam
Stochastic Optimization
Convolution
Convolutions
Griffin-Lim Algorithm
Phase Reconstruction
Location Sensitive Attention
Attention Mechanisms
Linear Layer
Feedforward Networks
BiLSTM
Bidirectional Recurrent Neural Networks
WaveTTS
Text-to-Speech Models