Image Retrieval

666 papers with code • 54 benchmarks • 75 datasets

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.

( Image credit: DELF )

Benchmarks

Add a Result

These leaderboards are used to track progress in Image Retrieval

Dataset	Best Model	Compare
ROxford (Medium)	Hypergraph propagation+Community selection	See all
RParis (Medium)	Hypergraph propagation	See all
ROxford (Hard)	SuperGlobal	See all
RParis (Hard)	SuperGlobal	See all
CREPE (Compositional REPresentation Evaluation)	ViT-L-14 (LAION400M)	See all
Flickr30K 1K test	X-VLM (base)	See all
Fashion IQ	SPRC	See all
SOP	Unicom+ViT-L@336px	See all
Oxf5k	Offline Diffusion	See all
Flickr30k-CN	InternVL-G-FT	See all
CIRR	SPRC	See all
iNaturalist	Unicom+ViT-L@336px	See all
Oxf105k	Offline Diffusion	See all
MUGE Retrieval	CN-CLIP (ViT-H/14)	See all
COCO-CN	CN-CLIP (ViT-H/14)	See all
CUB-200-2011	CGD (MG/SG)	See all
CARS196	CGD (MG/SG)	See all
Par6k	Offline Diffusion	See all
Par106k	Offline Diffusion	See all
In-Shop	CGD (SG/GS)	See all
Flickr30k	BLIP-2 ViT-G (zero-shot, 1K test set)	See all
MS COCO	BLIP-2 ViT-G (fine-tuned)	See all
AmsterTime	DINOv2 distilled (ViT-L/14 frozen)	See all
PhotoChat	PaCE	See all
ConQA Descriptive	CLIP	See all
ConQA Conceptual	CLIP	See all
DeepFashion - Consumer-to-shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
Exact Street2Shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
LaSCo	CASE	See all
DeepPatent	SwinV2	See all
24/7 Tokyo	HED-N-GAN	See all
street2shop - topwear	Ranknet	See all
INRIA Holidays	MultiGrain R50 @ 800	See all
Paris6k	IME layer	See all
Oxford5k	GNN-Reranking	See all
AIC-ICC	ERNIE-ViL2.0	See all
WIT	WIT-ALL	See all
CBVS	UniCLP	See all
NUS-WIDE	LESA	See all
DeepFashion	STIR	See all
Google Landmarks Dataset v2 (retrieval, testing)	ResNet101+ArcFace GLDv2-train-clean	See all
Google Landmarks Dataset v2 (retrieval, validation)	ResNet101+ArcFace GLDv2-train-clean	See all
INSTRE	IME layer	See all
CIFAR-10	Custom: 3 conv + 2 fcn	See all
ImageCoDe	ContextualCLIP	See all
PKU-Reid	IHDA	See all
PKU SketchRe-ID Dataset	IHDA	See all
FETA Car-Manuals	FETA's CLIP-MIL (Many-Shot Image-to-text)	See all
FooDI-ML (Global)	ADAPT-I2T	See all
FooDI-ML (Spain)	ADAPT-I2T	See all
Localized Narratives	OPT	See all
ICFG-PEDES	SSAN	See all
RUC-CAS-WenLan	CMCL	See all
ROxford Medium without fine-tuning	HesAff–rSIFT–VLAD	See all

Show all 54 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Image Retrieval models and implementations

huggingface/transformers

4 papers

124,984

OML-Team/open-metric-learning

4 papers

762

kornia/kornia

2 papers

9,377

salesforce/lavis

2 papers

8,724

See all 10 libraries.

Datasets

Subtasks

Medical Image Retrieval

Multi-Label Image Retrieval

Face Image Retrieval

Video-to-Shop

Image Instance Retrieval

Semi-Supervised Sketch Based Image Retrieval

Chat-based Image Retrieval

Latest papers with no code

Most implemented Social Latest No code

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

no code yet • 23 Apr 2024

Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.

Paper
Add Code

Collaborative Visual Place Recognition through Federated Learning

no code yet • 20 Apr 2024

Visual Place Recognition (VPR) aims to estimate the location of an image by treating it as a retrieval problem.

Paper
Add Code

Shotit: compute-efficient image-to-video search engine for the cloud

no code yet • 18 Apr 2024

We present Shotit, a cloud-native image-to-video search engine that tailors this search scenario in a compute-efficient approach.

Paper
Add Code

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

no code yet • 17 Apr 2024

The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text.

Paper
Add Code

Spatial-Aware Image Retrieval: A Hyperdimensional Computing Approach for Efficient Similarity Hashing

no code yet • 17 Apr 2024

Our work introduces a transformative image hashing framework enabling spatial-aware conditional retrieval.

Paper
Add Code

Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping

no code yet • 9 Apr 2024

A crucial practical aspect for an object identification model is to be flexible in input size.

Paper
Add Code

Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning

no code yet • 6 Apr 2024

It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts.

Paper
Add Code

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

no code yet • 31 Mar 2024

In Visual Place Recognition (VPR) the pose of a query image is estimated by comparing the image to a map of reference images with known reference poses.

Paper
Add Code

Do Vision-Language Models Understand Compound Nouns?

no code yet • 30 Mar 2024

We curate Compun, a novel benchmark with 400 unique and commonly used CNs, to evaluate the effectiveness of VLMs in interpreting CNs.

Paper
Add Code

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

no code yet • 28 Mar 2024

Image retrieval, i. e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures.

Paper
Add Code

Image Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result