Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Residual GRU
Recurrent Neural Networks
Max Pooling
Pooling Operations
Batch Normalization
Normalization
Dropout
Regularization
BiGRU
Bidirectional Recurrent Neural Networks
Residual Connection
Skip Connections
Dense Connections
Feedforward Networks
Highway Layer
Miscellaneous Components
GRU
Recurrent Neural Networks
Tanh Activation
Activation Functions
Convolution
Convolutions
Additive Attention
Attention Mechanisms
Highway Network
Feedforward Networks
Griffin-Lim Algorithm
Phase Reconstruction
CBHG
Speech Synthesis Blocks
ReLU
Activation Functions
Tacotron
Text-to-Speech Models