DV3 Convolution Block

Introduced by Ping et al. in Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

DV3 Convolution Block is a convolutional block used for the Deep Voice 3 text-to-speech architecture. It consists of a 1-D convolution with a gated linear unit and a residual connection. In the Figure, $c$ denotes the dimensionality of the input. The convolution output of size $2 \cdot c$ is split into equal-sized portions: the gate vector and the input vector. A scaling factor $\sqrt{0.5}$ is used to ensure that we preserve the input variance early in training. The gated linear unit provides a linear path for the gradient flow, which alleviates the vanishing gradient issue for stacked convolution blocks while retaining non-linearity. To introduce speaker-dependent control, a speaker-dependent embedding is added as a bias to the convolution filter output, after a softsign function. The authors use the softsign nonlinearity because it limits the range of the output while also avoiding the saturation problem that exponential based nonlinearities sometimes exhibit. Convolution filter weights are initialized with zero-mean and unit-variance activations throughout the entire network.

Source: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Latest Papers

PAPER DATE
Learning from a Complementary-label Source Domain: Theory and Algorithms
Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-08-04
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation
| Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-07-29
CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism
Riya JainNiraj SharmaFarhad MerchantSachin PatkarRainer Leupers
2020-05-30
Parallel Neural Text-to-Speech
Kainan PengWei PingZhao SongKexin Zhao
2020-01-01
Multi-Speaker End-to-End Speech Synthesis
Jihyun ParkKexin ZhaoKainan PengWei Ping
2019-07-09
Non-Autoregressive Neural Text-to-Speech
| Kainan PengWei PingZhao SongKexin Zhao
2019-05-21
Neural source-filter waveform models for statistical parametric speech synthesis
Xin WangShinji TakakiJunichi Yamagishi
2019-04-27
FloWaveNet : A Generative Flow for Raw Audio
Sungwon KimSang-gil LeeJongyoon SongSungroh Yoon
2018-11-06
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
| Wei PingKainan PengJitong Chen
2018-07-19
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
| Wei PingKainan PengAndrew GibianskySercan O. ArikAjay KannanSharan NarangJonathan RaimanJohn Miller
2017-10-20

Tasks

TASK PAPERS SHARE
Speech Synthesis 4 44.44%
Domain Adaptation 2 22.22%
Unsupervised Domain Adaptation 2 22.22%
Text-To-Speech Synthesis 1 11.11%

Categories