Pyramid Vision Transformer v2

Introduced by Wang et al. in PVT v2: Improved Baselines with Pyramid Vision Transformer

Pyramid Vision Transformer v2 (PVTv2) is a type of Vision Transformer for detection and segmentation tasks. It improves on PVTv1 through several design improvements: (1) overlapping patch embedding, (2) convolutional feed-forward networks, and (3) linear complexity attention layers that are orthogonal to the PVTv1 framework.

Source: PVT v2: Improved Baselines with Pyramid Vision Transformer

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Camouflaged Object Segmentation	1	16.67%
Zero-shot Generalization	1	16.67%
COVID-19 Diagnosis	1	16.67%
Image Classification	1	16.67%
Object Detection	1	16.67%
Panoptic Segmentation	1	16.67%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Dense Connections	Feedforward Networks
Depthwise Convolution	Convolutions
GELU	Activation Functions
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Vision Transformers

Image Models