WebVid-CoVR

Introduced by Ventura et al. in CoVR: Learning Composed Video Retrieval from Web Video Captions

The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.

The WebVid-CoVR dataset is automatically generated from web-scraped video-caption pairs, using a language model to generate the modification text. The dataset contains 1.6 million triplets, with diverse content and variations. The dataset also includes a manually annotated test set of 2.5K triplets, which can be used to evaluate CoVR models.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Composed Video Retrieval (CoVR)	WebVid-CoVR	CoVR-BLIP

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

Fashion IQ

WebVid-CoVR

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Fashion IQ

CIRR

LaSCo

Usage

License

Modalities

Languages

WebVid-CoVR

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Fashion IQ

CIRR

LaSCo

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages