TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	MTCIR (BLIP B/16)	mAP@10	8.03	# 10
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	MTCIR (CLIP L/14)	mAP@10	11.63	# 7
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	MTCIR (BLIP B/16)	R@5	58.87	# 5
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	MTCIR (CLIP L/14)	R@5	54.58	# 8
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	MTCIR (CLIP L/14)	(Recall@10+Recall@50)/2	46.42	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pretrain-like-you-inference-masked-tuning/zero-shot-composed-image-retrieval-zs-cir-on-2)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-2?p=pretrain-like-you-inference-masked-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pretrain-like-you-inference-masked-tuning/zero-shot-composed-image-retrieval-zs-cir-on-1)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-1?p=pretrain-like-you-inference-masked-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pretrain-like-you-inference-masked-tuning/zero-shot-composed-image-retrieval-zs-cir-on)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on?p=pretrain-like-you-inference-masked-tuning)`

Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval

13 Nov 2023 · Junyang Chen, Hanjiang Lai ·

Zero-shot composed image retrieval (ZS-CIR), which aims to retrieve a target image based on textual modifications to a reference image without triplet labeling, has gained more and more attention. Current ZS-CIR research mainly relies on two unlabeled pre-trained models: the vision-language model, e.g., CLIP, and the Pic2Word/textual inversion model. However, the pre-trained models and CIR tasks have substantial discrepancies, where the pre-trained models learn the similarities between vision and language but CIR aims to learn the modifications of the image guided by text. In this paper, we introduce a novel unlabeled and pre-trained masked tuning approach to reduce the gap between the pre-trained model and the downstream CIR task. We first reformulate the pre-trained vision-language contrastive learning as the CIR task, where we randomly mask input image patches to generate $\langle$masked image, text, image$\rangle$ triple from an image-text pair. Then, we propose a masked tuning, which uses the text and the masked image to learn the modifications of the original image. With such a simple design, it can learn to capture fine-grained text-guided modifications. Extensive experimental results demonstrate the significant superiority of our approach over the baseline models on three ZS-CIR datasets, including FashionIQ, CIRR, and CIRCO.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Contrastive Learning

Image Retrieval

Language Modelling

Retrieval

Zero-Shot Composed Image Retrieval (ZS-CIR)

Datasets

ImageNet-1K Fashion IQ

CIRR

CIRCO

Results from the Paper

Edit

Ranked #2 on Zero-Shot Composed Image Retrieval (ZS-CIR) on Fashion IQ

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	MTCIR (BLIP B/16)	mAP@10	8.03	# 10	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRCO	MTCIR (CLIP L/14)	mAP@10	11.63	# 7	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	MTCIR (BLIP B/16)	R@5	58.87	# 5	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	MTCIR (CLIP L/14)	R@5	54.58	# 8	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	MTCIR (CLIP L/14)	(Recall@10+Recall@50)/2	46.42	# 2	Compare

Methods

Add Remove

CLIP • Contrastive Learning

Edit Social Preview

Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove