TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Retrieval	CIRR	CASE (Pre-trained on LaSCo.Ca)	(Recall@5+Recall_subset@1)/2	78.25	# 3
Image Retrieval	CIRR	CASE	(Recall@5+Recall_subset@1)/2	77.5	# 4
Image Retrieval	Fashion IQ	CASE	(Recall@10+Recall@50)/2	59.74	# 4
Image Retrieval	LaSCo	CASE	Recall@1 (%)	7.08	# 1
Image Retrieval	LaSCo	BLIP4CIR	Recall@1 (%)	4.26	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data-roaming-and-early-fusion-for-composed/image-retrieval-on-lasco)](https://paperswithcode.com/sota/image-retrieval-on-lasco?p=data-roaming-and-early-fusion-for-composed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data-roaming-and-early-fusion-for-composed/image-retrieval-on-cirr)](https://paperswithcode.com/sota/image-retrieval-on-cirr?p=data-roaming-and-early-fusion-for-composed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data-roaming-and-early-fusion-for-composed/image-retrieval-on-fashion-iq)](https://paperswithcode.com/sota/image-retrieval-on-fashion-iq?p=data-roaming-and-early-fusion-for-composed)`

Data Roaming and Quality Assessment for Composed Image Retrieval

16 Mar 2023 · Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski ·

The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other vision and language (V&L) datasets. Additionally, some of these datasets have noticeable issues, such as queries containing redundant modalities. To address these shortcomings, we introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones. Pre-training on our LaSCo, shows a noteworthy improvement in performance, even in zero-shot. Furthermore, we propose a new approach for analyzing CoIR datasets and methods, which detects modality redundancy or necessity, in queries. We also introduce a new CoIR baseline, the Cross-Attention driven Shift Encoder (CASE). This baseline allows for early fusion of modalities using a cross-attention module and employs an additional auxiliary task during training. Our experiments demonstrate that this new baseline outperforms the current state-of-the-art methods on established benchmarks like FashionIQ and CIRR.

PDF Abstract

Code

Add Remove Mark official

levymsn/LaSCo official

Tasks

Add Remove

Composed Image Retrieval (CoIR)

Image Retrieval

Retrieval

Datasets

Introduced in the Paper:

LaSCo

Used in the Paper:

MS COCO

Visual Question Answering v2.0 Fashion IQ

CIRR

Results from the Paper

Edit

Ranked #1 on Image Retrieval on LaSCo

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Retrieval	CIRR	CASE (Pre-trained on LaSCo.Ca)	(Recall@5+Recall_subset@1)/2	78.25	# 3	Compare
Image Retrieval	CIRR	CASE	(Recall@5+Recall_subset@1)/2	77.5	# 4	Compare
Image Retrieval	Fashion IQ	CASE	(Recall@10+Recall@50)/2	59.74	# 4	Compare
Image Retrieval	LaSCo	CASE	Recall@1 (%)	7.08	# 1	Compare
Image Retrieval	LaSCo	BLIP4CIR	Recall@1 (%)	4.26	# 2	Compare

Methods

Add Remove

Concatenated Skip Connection • Cross-Attention Module • Softmax

Edit Social Preview

Data Roaming and Quality Assessment for Composed Image Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove