The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.

The WebVid-CoVR dataset is automatically generated from web-scraped video-caption pairs, using a language model to generate the modification text. The dataset contains 1.6 million triplets, with diverse content and variations. The dataset also includes a manually annotated test set of 2.5K triplets, which can be used to evaluate CoVR models.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages