VirTex

Introduced by Desai et al. in VirTex: Learning Visual Representations from Textual Annotations

VirText, or Visual representations from Textual annotations is a pretraining approach using semantically dense captions to learn visual representations. First a ConvNet and Transformer are jointly trained from scratch to generate natural language captions for images. Then, the learned features are transferred to downstream visual recognition tasks.

Source: VirTex: Learning Visual Representations from Textual Annotations

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Reconstruction	1	12.50%
Quantization	1	12.50%
General Classification	1	12.50%
Image Captioning	1	12.50%
Image Classification	1	12.50%
Instance Segmentation	1	12.50%
Object Detection	1	12.50%
Semantic Segmentation	1	12.50%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Image Representations