TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Retrieval	CIRR	CoVR-BLIP	(Recall@5+Recall_subset@1)/2	76.81	# 5
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	CoVR-BLIP	R@5	66.7	# 2
Composed Image Retrieval (CoIR)	CIRR	CoVR-BLIP	(Recall@5+Recall_subset@1)/2	76.81	# 1
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	CoVR-BLIP	(Recall@10+Recall@50)/2	36.17	# 9
Image Retrieval	Fashion IQ	CoVR-BLIP	(Recall@10+Recall@50)/2	59.39	# 5
Composed Video Retrieval (CoVR)	WebVid-CoVR	CoVR-BLIP	R@5	79.93	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/composed-image-retrieval-coir-on-cirr-1)](https://paperswithcode.com/sota/composed-image-retrieval-coir-on-cirr-1?p=covr-learning-composed-video-retrieval-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/composed-video-retrieval-covr-on-covr)](https://paperswithcode.com/sota/composed-video-retrieval-covr-on-covr?p=covr-learning-composed-video-retrieval-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/zero-shot-composed-image-retrieval-zs-cir-on-1)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-1?p=covr-learning-composed-video-retrieval-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/image-retrieval-on-cirr)](https://paperswithcode.com/sota/image-retrieval-on-cirr?p=covr-learning-composed-video-retrieval-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/image-retrieval-on-fashion-iq)](https://paperswithcode.com/sota/image-retrieval-on-fashion-iq?p=covr-learning-composed-video-retrieval-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/covr-learning-composed-video-retrieval-from/zero-shot-composed-image-retrieval-zs-cir-on-2)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-2?p=covr-learning-composed-video-retrieval-from)`

CoVR: Learning Composed Video Retrieval from Web Video Captions

28 Aug 2023 · Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol ·

Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers both text and image queries together, to search for relevant images in a database. Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image. However, manual curation of CoIR triplets is expensive and prevents scalability. In this work, we instead propose a scalable automatic dataset creation methodology that generates triplets given video-caption pairs, while also expanding the scope of the task to include composed video retrieval (CoVR). To this end, we mine paired videos with a similar caption from a large database, and leverage a large language model to generate the corresponding modification text. Applying this methodology to the extensive WebVid2M collection, we automatically construct our WebVid-CoVR dataset, resulting in 1.6 million triplets. Moreover, we introduce a new benchmark for CoVR with a manually annotated evaluation set, along with baseline results. Our experiments further demonstrate that training a CoVR model on our dataset effectively transfers to CoIR, leading to improved state-of-the-art performance in the zero-shot setup on both the CIRR and FashionIQ benchmarks. Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.

PDF Abstract

Code

Add Remove Mark official

lucas-ventura/CoVR official

Tasks

Add Remove

Composed Image Retrieval (CoIR)

Composed Video Retrieval (CoVR)

Image Retrieval

Language Modelling

Large Language Model

Retrieval

Video Retrieval

Zero-Shot Composed Image Retrieval (ZS-CIR)

Datasets

Introduced in the Paper:

WebVid-CoVR

Used in the Paper:

WebVid Fashion IQ

CIRR

Results from the Paper

Edit

Ranked #1 on Composed Video Retrieval (CoVR) on WebVid-CoVR

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Retrieval	CIRR	CoVR-BLIP	(Recall@5+Recall_subset@1)/2	76.81	# 5	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	CIRR	CoVR-BLIP	R@5	66.7	# 2	Compare
Composed Image Retrieval (CoIR)	CIRR	CoVR-BLIP	(Recall@5+Recall_subset@1)/2	76.81	# 1	Compare
Zero-Shot Composed Image Retrieval (ZS-CIR)	Fashion IQ	CoVR-BLIP	(Recall@10+Recall@50)/2	36.17	# 9	Compare
Image Retrieval	Fashion IQ	CoVR-BLIP	(Recall@10+Recall@50)/2	59.39	# 5	Compare
Composed Video Retrieval (CoVR)	WebVid-CoVR	CoVR-BLIP	R@5	79.93	# 1	Compare

Methods

Add Remove

CoVR

Edit Social Preview

CoVR: Learning Composed Video Retrieval from Web Video Captions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove