Composed Image Retrieval (or, Image Retreival conditioned on Language Feedback) is a relatively new retrieval task, where an input query consists of an image and short textual description of how to modify the image.
34 PAPERS • 3 BENCHMARKS
CIRCO (Composed Image Retrieval on Common Objects in context) is an open-domain benchmarking dataset for Composed Image Retrieval (CIR) based on real-world images from COCO 2017 unlabeled set. It is the first CIR dataset with multiple ground truths and aims to address the problem of false negatives in existing datasets. CIRCO comprises a total of 1020 queries, randomly divided into 220 and 800 for the validation and test set, respectively, with an average of 4.53 ground truths per query.
13 PAPERS • 1 BENCHMARK
Large Scale Composed Image Retrieval (LaSCo) is a new dataset for Composed Image Retrieval (CoIR), x10 times larger than current ones.
2 PAPERS • 1 BENCHMARK
The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.
PatternCom is a composed image retrieval benchmark based on PatternNet. PatternNet is a large-scale high-resolution remote sensing image retrieval dataset. There are 38 classes and each class has 800 images of size 256×256 pixels. In PatternCom, we select some classes to be depicted in query images, and add a query text that defines an attribute relevant to that class. For instance, query images of “swimming pools” are combined with text queries defining “shape” as “rectangular”, “oval”, and “kidney-shaped”. In total, PatternCom includes six attributes consisted of up to four different classes each. Each attribute can be associated with two to five values per class. The number of positives ranges from 2 to 1345 and there are more than 21k queries in total.
1 PAPER • 1 BENCHMARK