Whispered and Lombard Neural Speech Synthesis

It is desirable for a text-to-speech system to take into account the environment where synthetic speech is presented, and provide appropriate context-dependent output to the user. In this paper, we present and compare various approaches for generating different speaking styles, namely, normal, Lombard, and whisper speech, using only limited data... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Highway Layer
Miscellaneous Components
Tanh Activation
Activation Functions
Convolution
Convolutions
Dropout
Regularization
Highway Network
Feedforward Networks
Additive Attention
Attention Mechanisms
Residual GRU
Recurrent Neural Networks
Max Pooling
Pooling Operations
Dense Connections
Feedforward Networks
ReLU
Activation Functions
Griffin-Lim Algorithm
Phase Reconstruction
Batch Normalization
Normalization
BiGRU
Bidirectional Recurrent Neural Networks
Residual Connection
Skip Connections
GRU
Recurrent Neural Networks
CBHG
Speech Synthesis Blocks
Tacotron
Text-to-Speech Models