Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Although early text-to-speech (TTS) models such as Tacotron 2 have succeeded in generating human-like speech, their autoregressive (AR) architectures have a limitation that they require a lot of time to generate a mel-spectrogram consisting of hundreds of steps. In this paper, we propose a novel non-autoregressive TTS model called BVAE-TTS, which eliminates the architectural limitation and generates a mel-spectrogram in parallel... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Highway Layer
Miscellaneous Components
Max Pooling
Pooling Operations
Tanh Activation
Activation Functions
Highway Network
Feedforward Networks
Sigmoid Activation
Activation Functions
Dilated Causal Convolution
Temporal Convolutions
Residual GRU
Recurrent Neural Networks
Griffin-Lim Algorithm
Phase Reconstruction
BiGRU
Bidirectional Recurrent Neural Networks
ReLU
Activation Functions
Residual Connection
Skip Connections
CBHG
Speech Synthesis Blocks
LSTM
Recurrent Neural Networks
Convolution
Convolutions
Additive Attention
Attention Mechanisms
GRU
Recurrent Neural Networks
Dense Connections
Feedforward Networks
Tacotron
Text-to-Speech Models
BiLSTM
Bidirectional Recurrent Neural Networks
Zoneout
Regularization
Linear Layer
Feedforward Networks
WaveNet
Generative Audio Models
Dropout
Regularization
Batch Normalization
Normalization
Location Sensitive Attention
Attention Mechanisms
Mixture of Logistic Distributions
Output Functions
Tacotron 2
Text-to-Speech Models
AutoEncoder
Generative Models