DV3 Convolution Block

Introduced by Ping et al. in Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

DV3 Convolution Block is a convolutional block used for the Deep Voice 3 text-to-speech architecture. It consists of a 1-D convolution with a gated linear unit and a residual connection. In the Figure, $c$ denotes the dimensionality of the input. The convolution output of size $2 \cdot c$ is split into equal-sized portions: the gate vector and the input vector. A scaling factor $\sqrt{0.5}$ is used to ensure that we preserve the input variance early in training. The gated linear unit provides a linear path for the gradient flow, which alleviates the vanishing gradient issue for stacked convolution blocks while retaining non-linearity. To introduce speaker-dependent control, a speaker-dependent embedding is added as a bias to the convolution filter output, after a softsign function. The authors use the softsign nonlinearity because it limits the range of the output while also avoiding the saturation problem that exponential based nonlinearities sometimes exhibit. Convolution filter weights are initialized with zero-mean and unit-variance activations throughout the entire network.

Source: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Synthesis	4	36.36%
Domain Adaptation	2	18.18%
Unsupervised Domain Adaptation	2	18.18%
Melody Extraction	1	9.09%
Retrieval	1	9.09%
Text-To-Speech Synthesis	1	9.09%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Dense Connections	Feedforward Networks
Dropout	Regularization
GLU	Activation Functions
Residual Connection	Skip Connections
Softsign Activation	Activation Functions

Categories

Add Remove

Audio Model Blocks

Skip Connection Blocks