FiLM Module

Introduced by Chen et al. in WaveGrad: Estimating Gradients for Waveform Generation

The Feature-wise linear modulation (FiLM) module combines information from both noisy waveform and input mel-spectrogram. It is used in the WaveGrad model. The authors also added iteration index $n$ which indicates the noise level of the input waveform by using the Transformer sinusoidal positional embedding. To condition on the noise level directly, $n$ is replaced by $\sqrt{\bar{\alpha}}$ and a linear scale $C = 5000$ is applied. The FiLM module produces both scale and bias vectors given inputs, which are used in a UBlock for feature-wise affine transformation as:

$$ \gamma\left(D, \sqrt{\bar{\alpha}}\right) \odot U + \zeta\left(D, \sqrt{\bar{\alpha}}\right) $$

where $\gamma$ and $\zeta$ correspond to the scaling and shift vectors from the FiLM module, $D$ is the output from corresponding DBlock, $U$ is an intermediate output in the UBlock.

Source: WaveGrad: Estimating Gradients for Waveform Generation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Synthesis	5	45.45%
Image Generation	2	18.18%
Denoising	2	18.18%
Text-To-Speech Synthesis	2	18.18%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Leaky ReLU	Activation Functions

Categories

Add Remove

Audio Model Blocks