TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Weakly-Supervised Semantic Segmentation	COCO 2014 val	WeakTr (ViT-S, multi-stage)	mIoU	50.3	# 7
Weakly-Supervised Semantic Segmentation	COCO 2014 val	WeakTr (DeiT-S, multi-stage)	mIoU	46.9	# 9
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 test	WeakTr (DeiT-S, multi-stage)	Mean IoU	74.1	# 11
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 test	WeakTr (ViT-S, multi-stage)	Mean IoU	79.0	# 4
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 train	WeakTr (DeiT-S, single-stage)	Mean IoU	76.5	# 2
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 val	WeakTr (DeiT-S, multi-stage)	Mean IoU	74.0	# 12
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 val	WeakTr (ViT-S, multi-stage)	Mean IoU	78.4	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weaktr-exploring-plain-vision-transformer-for/weakly-supervised-semantic-segmentation-on-14)](https://paperswithcode.com/sota/weakly-supervised-semantic-segmentation-on-14?p=weaktr-exploring-plain-vision-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weaktr-exploring-plain-vision-transformer-for/weakly-supervised-semantic-segmentation-on-1)](https://paperswithcode.com/sota/weakly-supervised-semantic-segmentation-on-1?p=weaktr-exploring-plain-vision-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weaktr-exploring-plain-vision-transformer-for/weakly-supervised-semantic-segmentation-on)](https://paperswithcode.com/sota/weakly-supervised-semantic-segmentation-on?p=weaktr-exploring-plain-vision-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weaktr-exploring-plain-vision-transformer-for/weakly-supervised-semantic-segmentation-on-4)](https://paperswithcode.com/sota/weakly-supervised-semantic-segmentation-on-4?p=weaktr-exploring-plain-vision-transformer-for)`

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

3 Apr 2023 · Lianghui Zhu, Yingyue Li, Jiemin Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang ·

This paper explores the properties of the plain Vision Transformer (ViT) for Weakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM) is of critical importance for understanding a classification network and launching WSSS. We observe that different attention heads of ViT focus on different image areas. Thus a novel weight-based method is proposed to end-to-end estimate the importance of attention heads, while the self-attention maps are adaptively fused for high-quality CAM results that tend to have more complete objects. Besides, we propose a ViT-based gradient clipping decoder for online retraining with the CAM results to complete the WSSS task. We name this plain Transformer-based Weakly-supervised learning framework WeakTr. It achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 78.4% mIoU on the val set of PASCAL VOC 2012 and 50.3% mIoU on the val set of COCO 2014. Code is available at https://github.com/hustvl/WeakTr.

PDF Abstract

Code

Add Remove Mark official

hustvl/weaktr official

116

Tasks

Add Remove

Semantic Segmentation

Weakly-supervised Learning

Weakly supervised Semantic Segmentation

Weakly-Supervised Semantic Segmentation

Datasets

ImageNet

MS COCO ImageNet-1K PASCAL VOC 2012 test

Results from the Paper

Edit

Ranked #2 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 train

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Weakly-Supervised Semantic Segmentation	COCO 2014 val	WeakTr (ViT-S, multi-stage)	mIoU	50.3	# 7	Compare
Weakly-Supervised Semantic Segmentation	COCO 2014 val	WeakTr (DeiT-S, multi-stage)	mIoU	46.9	# 9	Compare
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 test	WeakTr (DeiT-S, multi-stage)	Mean IoU	74.1	# 11	Compare
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 test	WeakTr (ViT-S, multi-stage)	Mean IoU	79.0	# 4	Compare
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 train	WeakTr (DeiT-S, single-stage)	Mean IoU	76.5	# 2	Compare
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 val	WeakTr (DeiT-S, multi-stage)	Mean IoU	74.0	# 12	Compare
Weakly-Supervised Semantic Segmentation	PASCAL VOC 2012 val	WeakTr (ViT-S, multi-stage)	Mean IoU	78.4	# 5	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • CAM • Dense Connections • Dropout • Gradient Clipping • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove