## Manifold Mixup

Introduced by Verma et al. in Manifold Mixup: Better Representations by Interpolating Hidden States

Manifold Mixup is a regularization method that encourages neural networks to predict less confidently on interpolations of hidden representations. It leverages semantic interpolations as an additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance.

Consider training a deep neural network $f\left(x\right) = f_{k}\left(g_{k}\left(x\right)\right)$, where $g_{k}$ denotes the part of the neural network mapping the input data to the hidden representation at layer $k$, and $f_{k}$ denotes the part mapping such hidden representation to the output $f\left(x\right)$. Training $f$ using Manifold Mixup is performed in five steps:

(1) Select a random layer $k$ from a set of eligible layers $S$ in the neural network. This set may include the input layer $g_{0}\left(x\right)$.

(2) Process two random data minibatches $\left(x, y\right)$ and $\left(x', y'\right)$ as usual, until reaching layer $k$. This provides us with two intermediate minibatches $\left(g_{k}\left(x\right), y\right)$ and $\left(g_{k}\left(x'\right), y'\right)$.

(3) Perform Input Mixup on these intermediate minibatches. This produces the mixed minibatch:

$$\left(\tilde{g}_{k}, \tilde{y}\right) = \left(\text{Mix}_{\lambda}\left(g_{k}\left(x\right), g_{k}\left(x'\right)\right), \text{Mix}_{\lambda}\left(y, y'\right )\right),$$

where $\text{Mix}_{\lambda}\left(a, b\right) = \lambda \cdot a + \left(1 − \lambda\right) \cdot b$. Here, $\left(y, y' \right)$ are one-hot labels, and the mixing coefficient $\lambda \sim \text{Beta}\left(\alpha, \alpha\right)$ as in mixup. For instance, $\alpha = 1.0$ is equivalent to sampling $\lambda \sim U\left(0, 1\right)$.

(4) Continue the forward pass in the network from layer $k$ until the output using the mixed minibatch $\left(\tilde{g}_{k}, \tilde{y}\right)$.

(5) This output is used to compute the loss value and gradients that update all the parameters of the neural network.

#### Latest Papers

PAPER DATE
PointMixup: Augmentation for Point Clouds
Yunlu ChenVincent Tao HuEfstratios GavvesThomas MensinkPascal MettesPengwan YangCees G. M. Snoek
2020-08-14
Remix: Rebalanced Mixup
Hsin-Ping ChouShih-Chieh ChangJia-Yu PanWei WeiDa-Cheng Juan
2020-07-08
Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup
Jishnu Ray ChowdhuryCornelia CarageaDoina Caragea
2020-07-01
Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers
Loc TruongChace JonesBrian HutchinsonAndrew AugustBrenda PraggastisRobert JasperNicole NicholsAaron Tuor
2020-04-24
Charting the Right Manifold: Manifold Mixup for Few-shot Learning
| Puneet ManglaMayank SinghAbhishek SinhaNupur KumariVineeth N. BalasubramanianBalaji Krishnamurthy
2019-07-28
Manifold Mixup: Learning Better Representations by Interpolating Hidden States
| Vikas VermaAlex LambChristopher BeckhamAmir NajafiAaron CourvilleIoannis MitliagkasYoshua Bengio
2019-05-01
Manifold Mixup improves text recognition with CTC loss
Bastien MoyssetRonaldo Messina
2019-03-11
Manifold Mixup: Better Representations by Interpolating Hidden States
| Vikas VermaAlex LambChristopher BeckhamAmir NajafiIoannis MitliagkasAaron CourvilleDavid Lopez-PazYoshua Bengio
2018-06-13