TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	VAT (ResNet-101)	Mean IoU	41.3	# 49
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	VAT (ResNet-101)	FB-IoU	68.8	# 22
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	VAT (ResNet-101)	Mean IoU	47.9	# 42
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	VAT (ResNet-101)	FB-IoU	72.4	# 15
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-101)	Mean IoU	90.3	# 4
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-101)	FB-IoU	94	# 1
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-50)	Mean IoU	90.1	# 6
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-50)	FB-IoU	93.8	# 2
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-101)	Mean IoU	90.8	# 3
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-101)	FB-IoU	94.4	# 1
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-50)	Mean IoU	90.7	# 4
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-50)	FB-IoU	94.2	# 2
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-50)	Mean IoU	65.5	# 43
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-50)	FB-IoU	77.8	# 23
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-101)	Mean IoU	67.9	# 18
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-101)	FB-IoU	79.6	# 8
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-50)	Mean IoU	70.1	# 35
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-50)	FB-IoU	80.9	# 21
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-101)	Mean IoU	72	# 16
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-101)	FB-IoU	83.2	# 5
Semantic correspondence	PF-PASCAL	VAT (ECCV)	PCK	92.3	# 6
Semantic correspondence	PF-WILLOW	VAT (ECCV)	PCK	81.6	# 2
Semantic correspondence	SPair-71k	VAT (ECCV)	PCK	55.5	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/semantic-correspondence-on-pf-willow)](https://paperswithcode.com/sota/semantic-correspondence-on-pf-willow?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-fss-1000-5)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-fss-1000-5?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-fss-1000-1)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-fss-1000-1?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/semantic-correspondence-on-pf-pascal)](https://paperswithcode.com/sota/semantic-correspondence-on-pf-pascal?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/semantic-correspondence-on-spair-71k)](https://paperswithcode.com/sota/semantic-correspondence-on-spair-71k?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-pascal-5i-5)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-pascal-5i-5?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-pascal-5i-1)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-pascal-5i-1?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-coco-20i-5)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-coco-20i-5?p=cost-aggregation-with-4d-convolutional-swin)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cost-aggregation-with-4d-convolutional-swin/few-shot-semantic-segmentation-on-coco-20i-1)](https://paperswithcode.com/sota/few-shot-semantic-segmentation-on-coco-20i-1?p=cost-aggregation-with-4d-convolutional-swin)`

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

22 Jul 2022 · Sunghwan Hong, Seokju Cho, Jisu Nam, Stephen Lin, Seungryong Kim ·

This paper presents a novel cost aggregation network, called Volumetric Aggregation with Transformers (VAT), for few-shot segmentation. The use of transformers can benefit correlation map aggregation through self-attention over a global receptive field. However, the tokenization of a correlation map for transformer processing can be detrimental, because the discontinuity at token boundaries reduces the local context available near the token edges and decreases inductive bias. To address this problem, we propose a 4D Convolutional Swin Transformer, where a high-dimensional Swin Transformer is preceded by a series of small-kernel convolutions that impart local context to all pixels and introduce convolutional inductive bias. We additionally boost aggregation performance by applying transformers within a pyramidal structure, where aggregation at a coarser level guides aggregation at a finer level. Noise in the transformer output is then filtered in the subsequent decoder with the help of the query's appearance embedding. With this model, a new state-of-the-art is set for all the standard benchmarks in few-shot segmentation. It is shown that VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.

PDF Abstract

Code

Add Remove Mark official

Seokju-Cho/Volumetric-Aggregation-T… official

145

Tasks

Add Remove

Decoder

Few-Shot Semantic Segmentation

Inductive Bias

Semantic correspondence

Datasets

MS COCO

PASCAL-5i

SPair-71k

PF-PASCAL

FSS-1000

PF-WILLOW

Results from the Paper

Edit

Ranked #2 on Semantic correspondence on PF-WILLOW

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	VAT (ResNet-101)	Mean IoU	41.3	# 49	Compare
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	VAT (ResNet-101)	FB-IoU	68.8	# 22	Compare
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	VAT (ResNet-101)	Mean IoU	47.9	# 42	Compare
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	VAT (ResNet-101)	FB-IoU	72.4	# 15	Compare
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-101)	Mean IoU	90.3	# 4	Compare
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-101)	FB-IoU	94	# 1	Compare
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-50)	Mean IoU	90.1	# 6	Compare
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	VAT (ResNet-50)	FB-IoU	93.8	# 2	Compare
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-101)	Mean IoU	90.8	# 3	Compare
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-101)	FB-IoU	94.4	# 1	Compare
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-50)	Mean IoU	90.7	# 4	Compare
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	VAT (ResNet-50)	FB-IoU	94.2	# 2	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-50)	Mean IoU	65.5	# 43	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-50)	FB-IoU	77.8	# 23	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-101)	Mean IoU	67.9	# 18	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	VAT (ResNet-101)	FB-IoU	79.6	# 8	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-50)	Mean IoU	70.1	# 35	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-50)	FB-IoU	80.9	# 21	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-101)	Mean IoU	72	# 16	Compare
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	VAT (ResNet-101)	FB-IoU	83.2	# 5	Compare
Semantic correspondence	PF-PASCAL	VAT (ECCV)	PCK	92.3	# 6	Compare
Semantic correspondence	PF-WILLOW	VAT (ECCV)	PCK	81.6	# 2	Compare
Semantic correspondence	SPair-71k	VAT (ECCV)	PCK	55.5	# 8	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer

Edit Social Preview

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove