Normalization

Online Normalization

Introduced by Chiley et al. in Online Normalization for Training Neural Networks

Online Normalization is a normalization technique for training deep neural networks. To define Online Normalization. we replace arithmetic averages over the full dataset in with exponentially decaying averages of online samples. The decay factors $\alpha_{f}$ and $\alpha_{b}$ for forward and backward passes respectively are hyperparameters for the technique.

We allow incoming samples $x_{t}$, such as images, to have multiple scalar components and denote feature-wide mean and variance by $\mu\left(x_{t}\right)$ and $\sigma^{2}\left(x_{t}\right)$. The algorithm also applies to outputs of fully connected layers with only one scalar output per feature. In fact, this case simplifies to $\mu\left(x_{t}\right) = x_{t}$ and $\sigma\left(x_{t}\right) = 0$. Denote scalars $\mu_{t}$ and $\sigma_{t}$ to denote running estimates of mean and variance across all samples. The subscript $t$ denotes time steps corresponding to processing new incoming samples.

Online Normalization uses an ongoing process during the forward pass to estimate activation means and variances. It implements the standard online computation of mean and variance generalized to processing multi-value samples and exponential averaging of sample statistics. The resulting estimates directly lead to an affine normalization transform.

$$ y_{t} = \frac{x_{t} - \mu_{t-1}}{\sigma_{t-1}} $$

$$ \mu_{t} = \alpha_{f}\mu_{t-1} + \left(1-\alpha_{f}\right)\mu\left(x_{t}\right) $$

$$ \sigma^{2}_{t} = \alpha_{f}\sigma^{2}_{t-1} + \left(1-\alpha_{f}\right)\sigma^{2}\left(x_{t}\right) + \alpha_{f}\left(1-\alpha_{f}\right)\left(\mu\left(x_{t}\right) - \mu_{t-1}\right)^{2} $$

Source: Online Normalization for Training Neural Networks

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories