Conditional Batch Normalization

Introduced by Vries et al. in Modulating early visual processing by language

Conditional Batch Normalization (CBN) is a class-conditional variant of batch normalization. The key idea is to predict the $\gamma$ and $\beta$ of the batch normalization from an embedding - e.g. a language embedding in VQA. CBN enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off. CBN has also been used in GANs to allow class information to affect the batch normalization parameters.

Consider a single convolutional layer with batch normalization module $\text{BN}\left(F_{i,c,h,w}|\gamma_{c}, \beta_{c}\right)$ for which pretrained scalars $\gamma_{c}$ and $\beta_{c}$ are available. We would like to directly predict these affine scaling parameters from, e.g., a language embedding $\mathbf{e_{q}}$. When starting the training procedure, these parameters must be close to the pretrained values to recover the original ResNet model as a poor initialization could significantly deteriorate performance. Unfortunately, it is difficult to initialize a network to output the pretrained $\gamma$ and $\beta$. For these reasons, the authors propose to predict a change $\delta\beta_{c}$ and $\delta\gamma_{c}$ on the frozen original scalars, for which it is straightforward to initialize a neural network to produce an output with zero-mean and small variance.

The authors use a one-hidden-layer MLP to predict these deltas from a question embedding $\mathbf{e_{q}}$ for all feature maps within the layer:

$$\Delta\beta = \text{MLP}\left(\mathbf{e_{q}}\right)$$

$$\Delta\gamma = \text{MLP}\left(\mathbf{e_{q}}\right)$$

So, given a feature map with $C$ channels, these MLPs output a vector of size $C$. We then add these predictions to the $\beta$ and $\gamma$ parameters:

$$ \hat{\beta}_{c} = \beta_{c} + \Delta\beta_{c} $$

$$ \hat{\gamma}_{c} = \gamma_{c} + \Delta\gamma_{c} $$

Finally, these updated $\hat{β}$ and $\hat{\gamma}$ are used as parameters for the batch normalization: $\text{BN}\left(F_{i,c,h,w}|\hat{\gamma_{c}}, \hat{\beta_{c}}\right)$. The authors freeze all ResNet parameters, including $\gamma$ and $\beta$, during training. A ResNet consists of four stages of computation, each subdivided in several residual blocks. In each block, the authors apply CBN to the three convolutional layers.

Source: Modulating early visual processing by language

Latest Papers

not-so-BigGAN: Generating High-Fidelity Images on a Small Compute Budget
Seungwook HanAkash SrivastavaCole HurwitzPrasanna SattigeriDavid D. Cox
Neural Crossbreed: Neural Based Image Metamorphosis
Sanghun ParkKwanggyoon SeoJunyong Noh
A Spectral Energy Distance for Parallel Speech Synthesis
Alexey A. GritsenkoTim SalimansRianne van den BergJasper SnoekNal Kalchbrenner
Instance Selection for GANs
Terrance DeVriesMichal DrozdzalGraham W. Taylor
Interpolating GANs to Scaffold Autotelic Creativity
Ziv EpsteinOcéane BoulaisSkylar GordonMatt Groh
Differentiable Augmentation for Data-Efficient GAN Training
| Shengyu ZhaoZhijian LiuJi LinJun-Yan ZhuSong Han
Training Generative Adversarial Networks with Limited Data
| Tero KarrasMiika AittalaJanne HellstenSamuli LaineJaakko LehtinenTimo Aila
Learning disconnected manifolds: a no GANs land
Ugo TanielianThibaut IssenhuthElvis DohmatobJeremie Mary
Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models
| Andrey VoynovStanislav MorozovArtem Babenko
A U-Net Based Discriminator for Generative Adversarial Networks
Edgar Schonfeld Bernt Schiele Anna Khoreva
Network Fusion for Content Creation with Conditional INNs
Robin RombachPatrick EsserBjörn Ommer
GANSpace: Discovering Interpretable GAN Controls
| Erik HärkönenAaron HertzmannJaakko LehtinenSylvain Paris
Evolving Normalization-Activation Layers
| Hanxiao LiuAndrew BrockKaren SimonyanQuoc V. Le
Feature Quantization Improves GAN Training
| Yang ZhaoChunyuan LiPing YuJianfeng GaoChangyou Chen
Exemplar Normalization for Learning Deep Representation
Ruimao ZhangZhanglin PengLingyun WuZhen LiPing Luo
BigGAN-based Bayesian reconstruction of natural images from human brain activity
Kai QiaoJian ChenLinyuan WangChi ZhangLi TongBin Yan
Transformation-based Adversarial Video Prediction on Large-Scale Data
Pauline LucAidan ClarkSander DielemanDiego de Las CasasYotam DoronAlbin CassirerKaren Simonyan
A U-Net Based Discriminator for Generative Adversarial Networks
| Edgar SchönfeldBernt SchieleAnna Khoreva
Improved Consistency Regularization for GANs
Zhengli ZhaoSameer SinghHonglak LeeZizhao ZhangAugustus OdenaHan Zhang
Reconstructing Natural Scenes from fMRI Patterns using BigBiGAN
Milad MozafariLeila ReddyRufin VanRullen
Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures
Mohamed El Amine SeddikCosme LouartMohamed TamaazoustiRomain Couillet
CNN-generated images are surprisingly easy to spot... for now
| Sheng-Yu WangOliver WangRichard ZhangAndrew OwensAlexei A. Efros
LOGAN: Latent Optimisation for Generative Adversarial Networks
| Yan WuJeff DonahueDavid BalduzziKaren SimonyanTimothy Lillicrap
Detecting GAN generated errors
Xiru ZhuFengdi CheTianzi YangTzuyang YuDavid MegerGregory Dudek
Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis
| Ceyuan YangYujun ShenBolei Zhou
Improving sample diversity of a pre-trained, class-conditional GAN by changing its class embeddings
| Qi LiLong MaiMichael A. AlcornAnh Nguyen
High Fidelity Speech Synthesis with Adversarial Networks
| Mikołaj BińkowskiJeff DonahueSander DielemanAidan ClarkErich ElsenNorman CasagrandeLuis C. CoboKaren Simonyan
Adversarial Video Generation on Complex Datasets
Aidan ClarkJeff DonahueKaren Simonyan
Large Scale Adversarial Representation Learning
| Jeff DonahueKaren Simonyan
Whitening and Coloring transform for GANs
Aliaksandr SiarohinEnver SanginetoNicu Sebe
Improved Precision and Recall Metric for Assessing Generative Models
| Tuomas KynkäänniemiTero KarrasSamuli LaineJaakko LehtinenTimo Aila
High-Fidelity Image Generation With Fewer Labels
| Mario LucicMichael TschannenMarvin RitterXiaohua ZhaiOlivier BachemSylvain Gelly
Spatially Controllable Image Synthesis with Internal Representation Collaging
| Ryohei SuzukiMasanori KoyamaTakeru MiyatoTaizan YonetsujiHuachun Zhu
Metropolis-Hastings view on variational inference and adversarial training
Kirill NeklyudovEvgenii EgorovPavel ShvechikovDmitry Vetrov
Large Scale GAN Training for High Fidelity Natural Image Synthesis
| Andrew BrockJeff DonahueKaren Simonyan
Whitening and Coloring batch transform for GANs
| Aliaksandr SiarohinEnver SanginetoNicu Sebe
Learning Visual Reasoning Without Strong Priors
| Ethan PerezHarm de VriesFlorian StrubVincent DumoulinAaron Courville
Modulating early visual processing by language
| Harm de VriesFlorian StrubJérémie MaryHugo LarochelleOlivier PietquinAaron Courville