TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	Cityscapes	DiffSeg (512)	mIoU	21.2	# 2
Semantic Segmentation	Cityscapes	DiffSeg (512)	Pixel Accuracy	76	# 1
Semantic Segmentation	COCO-Stuff-27	DiffSeg (512)	mIoU	43.6	# 1
Semantic Segmentation	COCO-Stuff-27	DiffSeg (512)	Pixel Accuracy	72.5	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffuse-attend-and-segment-unsupervised-zero/semantic-segmentation-on-coco-stuff-27)](https://paperswithcode.com/sota/semantic-segmentation-on-coco-stuff-27?p=diffuse-attend-and-segment-unsupervised-zero)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffuse-attend-and-segment-unsupervised-zero/semantic-segmentation-on-cityscapes-2)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-2?p=diffuse-attend-and-segment-unsupervised-zero)`

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

23 Aug 2023 · Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco ·

Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at \url{https://sites.google.com/view/diffseg/home}.

PDF Abstract

Code

Add Remove Mark official

google/diffseg official

212

Tasks

Add Remove

Segmentation

Semantic Segmentation

valid

Zero Shot Segmentation

Datasets

Cityscapes

DomainNet

COCO-Stuff

Results from the Paper

Edit

Ranked #1 on Semantic Segmentation on COCO-Stuff-27

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	Cityscapes	DiffSeg (512)	mIoU	21.2	# 2	Compare
Semantic Segmentation	Cityscapes	DiffSeg (512)	Pixel Accuracy	76	# 1	Compare
Semantic Segmentation	COCO-Stuff-27	DiffSeg (512)	mIoU	43.6	# 1	Compare
Semantic Segmentation	COCO-Stuff-27	DiffSeg (512)	Pixel Accuracy	72.5	# 1	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove