Using previous acoustic context to improve Text-to-Speech synthesis

7 Dec 2020 Pilar Oplustil-Gallegos Simon King

Many speech synthesis datasets, especially those derived from audiobooks, naturally comprise sequences of utterances. Nevertheless, such data are commonly treated as individual, unordered utterances both when training a model and at inference time... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
LSTM
Recurrent Neural Networks
BiLSTM
Bidirectional Recurrent Neural Networks
Dilated Causal Convolution
Temporal Convolutions
Zoneout
Regularization
WaveNet
Generative Audio Models
Linear Layer
Feedforward Networks
Residual GRU
Recurrent Neural Networks
Highway Layer
Miscellaneous Components
Residual Connection
Skip Connections
Batch Normalization
Normalization
Mixture of Logistic Distributions
Output Functions
Convolution
Convolutions
Dense Connections
Feedforward Networks
Location Sensitive Attention
Attention Mechanisms
Max Pooling
Pooling Operations
GRU
Recurrent Neural Networks
BiGRU
Bidirectional Recurrent Neural Networks
Tanh Activation
Activation Functions
Highway Network
Feedforward Networks
Dropout
Regularization
Tacotron 2
Text-to-Speech Models
ReLU
Activation Functions
CBHG
Speech Synthesis Blocks
Griffin-Lim Algorithm
Phase Reconstruction
Additive Attention
Attention Mechanisms
Tacotron
Text-to-Speech Models