Parallel WaveNet conditioned on VAE latent vectors

Recently the state-of-the-art text-to-speech synthesis systems have shifted to a two-model approach: a sequence-to-sequence model to predict a representation of speech (typically mel-spectrograms), followed by a 'neural vocoder' model which produces the time-domain speech waveform from this intermediate speech representation. This approach is capable of synthesizing speech that is confusable with natural speech recordings... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Tanh Activation
Activation Functions
Highway Layer
Miscellaneous Components
Dilated Causal Convolution
Temporal Convolutions
Mixture of Logistic Distributions
Output Functions
Additive Attention
Attention Mechanisms
ReLU
Activation Functions
Convolution
Convolutions
Griffin-Lim Algorithm
Phase Reconstruction
Highway Network
Feedforward Networks
Max Pooling
Pooling Operations
Batch Normalization
Normalization
Dropout
Regularization
Dense Connections
Feedforward Networks
BiGRU
Bidirectional Recurrent Neural Networks
GRU
Recurrent Neural Networks
WaveNet
Generative Audio Models
VAE
Generative Models
Residual Connection
Skip Connections
CBHG
Speech Synthesis Blocks
Residual GRU
Recurrent Neural Networks
Tacotron
Text-to-Speech Models