Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis

4 Aug 2018 Daisy Stanton Yuxuan Wang RJ Skerry-Ryan

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a state-of-the-art end-to-end text-to-speech synthesis system, to uncover expressive factors of variation in speaking style... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Griffin-Lim Algorithm
Phase Reconstruction
Sigmoid Activation
Activation Functions
Highway Layer
Miscellaneous Components
Residual Connection
Skip Connections
Convolution
Convolutions
Batch Normalization
Normalization
Max Pooling
Pooling Operations
Residual GRU
Recurrent Neural Networks
BiGRU
Bidirectional Recurrent Neural Networks
Highway Network
Feedforward Networks
CBHG
Speech Synthesis Blocks
ReLU
Activation Functions
Dropout
Regularization
Dense Connections
Feedforward Networks
Tanh Activation
Activation Functions
Additive Attention
Attention Mechanisms
GRU
Recurrent Neural Networks
Tacotron
Text-to-Speech Models