Self-Supervised Learning

ReLIC

Introduced by Mitrovic et al. in Representation Learning via Invariant Causal Mechanisms

ReLIC, or Representation Learning via Invariant Causal Mechanisms, is a self-supervised learning objective that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees.

We can write the objective as:

$$ \underset{X}{\mathbb{E}} \underset{\sim_{l k}, a_{q \mathcal{A}}}{\mathbb{E}} \sum_{b \in\left(a_{l k}, a_{q t}\right)} \mathcal{L}_{b}\left(Y^{R}, f(X)\right) \text { s.t. } K L\left(p^{d o\left(a_{l k}\right)}\left(Y^{R} \mid f(X)\right), p^{d o\left(a_{q t}\right)}\left(Y^{R} \mid f(X)\right)\right) \leq \rho $$

where $\mathcal{L}$ is the proxy task loss and $K L$ is the Kullback-Leibler (KL) divergence. Note that any distance measure on distributions can be used in place of the KL divergence.

Concretely, as proxy task we associate to every datapoint $x_{i}$ the label $y_{i}^{R}=i$. This corresponds to the instance discrimination task, commonly used in contrastive learning. We take pairs of points $\left(x_{i}, x_{j}\right)$ to compute similarity scores and use pairs of augmentations $a_{l k}=\left(a_{l}, a_{k}\right) \in$ $\mathcal{A} \times \mathcal{A}$ to perform a style intervention. Given a batch of samples $\left(x_{i}\right)_{i=1}^{N} \sim \mathcal{D}$, we use

$$ p^{d o\left(a_{l k}\right)}\left(Y^{R}=j \mid f\left(x_{i}\right)\right) \propto \exp \left(\phi\left(f\left(x_{i}^{a_{l}}\right), h\left(x_{j}^{a_{k}}\right)\right) / \tau\right) $$

with $x^{a}$ data augmented with $a$ and $\tau$ a softmax temperature parameter. We encode $f$ using a neural network and choose $h$ to be related to $f$, e.g. $h=f$ or as a network with an exponential moving average of the weights of $f$ (e.g. target networks). To compare representations we use the function $\phi\left(f\left(x_{i}\right), h\left(x_{j}\right)\right)=\left\langle g\left(f\left(x_{i}\right)\right), g\left(h\left(x_{j}\right)\right)\right\rangle$ where $g$ is a fully-connected neural network often called the critic.

Combining these pieces, we learn representations by minimizing the following objective over the full set of data $x_{i} \in \mathcal{D}$ and augmentations $a_{l k} \in \mathcal{A} \times \mathcal{A}$

$$ -\sum_{i=1}^{N} \sum_{a_{l k}} \log \frac{\exp \left(\phi\left(f\left(x_{i}^{a_{l}}\right), h\left(x_{i}^{a_{k}}\right)\right) / \tau\right)}{\sum_{m=1}^{M} \exp \left(\phi\left(f\left(x_{i}^{a_{l}}\right), h\left(x_{m}^{a_{k}}\right)\right) / \tau\right)}+\alpha \sum_{a_{l k}, a_{q t}} K L\left(p^{d o\left(a_{l k}\right)}, p^{d o\left(a_{q t}\right)}\right) $$

with $M$ the number of points we use to construct the contrast set and $\alpha$ the weighting of the invariance penalty. The shorthand $p^{d o(a)}$ is used for $p^{d o(a)}\left(Y^{R}=j \mid f\left(x_{i}\right)\right)$. The Figure shows a schematic of the RELIC objective.

Source: Representation Learning via Invariant Causal Mechanisms

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories