Vision Transformers

OODformer

Introduced by Koner et al. in OODformer: Out-Of-Distribution Detection Transformer

OODformer is a transformer-based OOD detection architecture that leverages the contextualization capabilities of the transformer. Incorporating the transformer as the principal feature extractor allows to exploit the object concepts and their discriminate attributes along with their co-occurrence via visual attention.

OODformer employs ViT and its data efficient variant DeiT. Each encoder layer consist of multi-head self attention and a multi-layer perception block. The combination of MSA and MLP layers in the encoder jointly encode the attributes' importance, associated correlation, and co-occurrence. The [class] token (a representative of an image $x$) consolidated multiple attributes and their related features via the global context. The [class] token from the final layer is used for OOD detection in two ways; first, it is passed to $ F_{\text {classifier }}\left(x_{\text {feat }}\right)$ for softmax confidence score, and second it is used for latent space distance calculation.

Source: OODformer: Out-Of-Distribution Detection Transformer

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories