Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster... (read more)

PDF Abstract ICLR 2018 PDF ICLR 2018 Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Weight Normalization
Normalization
Softmax
Output Functions
L1 Regularization
Regularization
ReLU
Activation Functions
Dense Connections
Feedforward Networks
Softsign Activation
Activation Functions
Residual Connection
Skip Connections
Convolution
Convolutions
Dropout
Regularization
GLU
Activation Functions
Scaled Dot-Product Attention
Attention Mechanisms
Griffin-Lim Algorithm
Phase Reconstruction
Mixture of Logistic Distributions
Output Functions
Dilated Causal Convolution
Temporal Convolutions
WaveNet
Generative Audio Models
Gradient Clipping
Optimization
Adam
Stochastic Optimization
DV3 Attention Block
Audio Model Blocks
DV3 Convolution Block
Audio Model Blocks
Deep Voice 3
Text-to-Speech Models