TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	SEARLE-XL	mAP@10	12.73	# 6
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	SEARLE	mAP@10	9.94	# 8
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	SEARLE-XL	R@5	52.48	# 12
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	SEARLE	R@5	53.42	# 10
Zero-Shot Composed Image Retrieval (ZS-CIR)	FashionIQ	SEARLE-XL-OTI	R@10	27.61	# 1
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	SEARLE-XL	(Recall@10+Recall@50)/2	35.90	# 10
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	SEARLE	(Recall@10+Recall@50)/2	32.71	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-composed-image-retrieval-with/zero-shot-composed-image-retrieval-zs-cir-on-3)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-3?p=zero-shot-composed-image-retrieval-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-composed-image-retrieval-with/zero-shot-composed-image-retrieval-zs-cir-on)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on?p=zero-shot-composed-image-retrieval-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-composed-image-retrieval-with/zero-shot-composed-image-retrieval-zs-cir-on-1)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-1?p=zero-shot-composed-image-retrieval-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-composed-image-retrieval-with/zero-shot-composed-image-retrieval-zs-cir-on-2)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-2?p=zero-shot-composed-image-retrieval-with)`

Zero-Shot Composed Image Retrieval with Textual Inversion

ICCV 2023 · Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo ·

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images. The high effort and cost required for labeling datasets for CIR hamper the widespread usage of existing methods, as they rely on supervised learning. In this work, we propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset. Our approach, named zero-Shot composEd imAge Retrieval with textuaL invErsion (SEARLE), maps the visual features of the reference image into a pseudo-word token in CLIP token embedding space and integrates it with the relative caption. To support research on ZS-CIR, we introduce an open-domain benchmarking dataset named Composed Image Retrieval on Common Objects in context (CIRCO), which is the first dataset for CIR containing multiple ground truths for each query. The experiments show that SEARLE exhibits better performance than the baselines on the two main datasets for CIR tasks, FashionIQ and CIRR, and on the proposed CIRCO. The dataset, the code and the model are publicly available at https://github.com/miccunifi/SEARLE.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

miccunifi/searle official

113

miccunifi/circo official

explainableml/vision_by_language

Tasks

Add Remove

Composed Image Retrieval (CoIR)

Image Retrieval

Retrieval

Zero-Shot Composed Image Retrieval (ZS-CIR)

Datasets

Introduced in the Paper:

CIRCO

Used in the Paper:

MS COCO

Visual Question Answering Fashion IQ

CIRR

Results from the Paper

Edit

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on FashionIQ

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	SEARLE-XL	mAP@10	12.73	# 6	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	SEARLE	mAP@10	9.94	# 8	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	SEARLE-XL	R@5	52.48	# 12	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	SEARLE	R@5	53.42	# 10	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	FashionIQ	SEARLE-XL-OTI	R@10	27.61	# 1	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	SEARLE-XL	(Recall@10+Recall@50)/2	35.90	# 10	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	SEARLE	(Recall@10+Recall@50)/2	32.71	# 12	Compare

Methods

Add Remove

CLIP

Edit Social Preview

Zero-Shot Composed Image Retrieval with Textual Inversion

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove