The Feature-wise linear modulation (FiLM) module combines information from both noisy waveform and input mel-spectrogram. It is used in the WaveGrad model. The authors also added iteration index $n$ which indicates the noise level of the input waveform by using the Transformer sinusoidal positional embedding. To condition on the noise level directly, $n$ is replaced by $\sqrt{\bar{\alpha}}$ and a linear scale $C = 5000$ is applied. The FiLM module produces both scale and bias vectors given inputs, which are used in a UBlock for feature-wise affine transformation as:
$$ \gamma\left(D, \sqrt{\bar{\alpha}}\right) \odot U + \zeta\left(D, \sqrt{\bar{\alpha}}\right) $$
where $\gamma$ and $\zeta$ correspond to the scaling and shift vectors from the FiLM module, $D$ is the output from corresponding DBlock, $U$ is an intermediate output in the UBlock.
Source: WaveGrad: Estimating Gradients for Waveform GenerationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Speech Synthesis | 5 | 45.45% |
Image Generation | 2 | 18.18% |
Denoising | 2 | 18.18% |
Text-To-Speech Synthesis | 2 | 18.18% |