Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations

End-to-end models, particularly Tacotron-based ones, are currently a popular solution for text-to-speech synthesis. They allow the production of high-quality synthesized speech with little to no text preprocessing... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Highway Layer
Miscellaneous Components
BiGRU
Bidirectional Recurrent Neural Networks
GRU
Recurrent Neural Networks
Max Pooling
Pooling Operations
Convolution
Convolutions
Dense Connections
Feedforward Networks
Tanh Activation
Activation Functions
Highway Network
Feedforward Networks
Dropout
Regularization
Batch Normalization
Normalization
Residual Connection
Skip Connections
CBHG
Speech Synthesis Blocks
Residual GRU
Recurrent Neural Networks
Griffin-Lim Algorithm
Phase Reconstruction
Additive Attention
Attention Mechanisms
ReLU
Activation Functions
Tacotron
Text-to-Speech Models