Image Retrieval

672 papers with code • 54 benchmarks • 75 datasets

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.

( Image credit: DELF )

Benchmarks

Add a Result

These leaderboards are used to track progress in Image Retrieval

Dataset	Best Model	Compare
ROxford (Medium)	Hypergraph propagation+Community selection	See all
RParis (Medium)	Hypergraph propagation	See all
ROxford (Hard)	SuperGlobal	See all
RParis (Hard)	SuperGlobal	See all
CREPE (Compositional REPresentation Evaluation)	ViT-L-14 (LAION400M)	See all
Flickr30K 1K test	X-VLM (base)	See all
Fashion IQ	SPRC	See all
SOP	Unicom+ViT-L@336px	See all
Oxf5k	Offline Diffusion	See all
Flickr30k-CN	InternVL-G-FT	See all
CIRR	SPRC	See all
iNaturalist	Unicom+ViT-L@336px	See all
Oxf105k	Offline Diffusion	See all
MUGE Retrieval	CN-CLIP (ViT-H/14)	See all
COCO-CN	CN-CLIP (ViT-H/14)	See all
CUB-200-2011	CGD (MG/SG)	See all
CARS196	CGD (MG/SG)	See all
Par6k	Offline Diffusion	See all
Par106k	Offline Diffusion	See all
In-Shop	CGD (SG/GS)	See all
Flickr30k	BLIP-2 ViT-G (zero-shot, 1K test set)	See all
MS COCO	BLIP-2 ViT-G (fine-tuned)	See all
AmsterTime	DINOv2 distilled (ViT-L/14 frozen)	See all
PhotoChat	PaCE	See all
ConQA Descriptive	CLIP	See all
ConQA Conceptual	CLIP	See all
DeepFashion - Consumer-to-shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
Exact Street2Shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
LaSCo	CASE	See all
DeepPatent	SwinV2	See all
24/7 Tokyo	HED-N-GAN	See all
street2shop - topwear	Ranknet	See all
INRIA Holidays	MultiGrain R50 @ 800	See all
Paris6k	IME layer	See all
Oxford5k	GNN-Reranking	See all
AIC-ICC	ERNIE-ViL2.0	See all
WIT	WIT-ALL	See all
CBVS	UniCLP	See all
NUS-WIDE	LESA	See all
DeepFashion	STIR	See all
Google Landmarks Dataset v2 (retrieval, testing)	ResNet101+ArcFace GLDv2-train-clean	See all
Google Landmarks Dataset v2 (retrieval, validation)	ResNet101+ArcFace GLDv2-train-clean	See all
INSTRE	IME layer	See all
CIFAR-10	Custom: 3 conv + 2 fcn	See all
ImageCoDe	ContextualCLIP	See all
PKU-Reid	IHDA	See all
PKU SketchRe-ID Dataset	IHDA	See all
FETA Car-Manuals	FETA's CLIP-MIL (Many-Shot Image-to-text)	See all
FooDI-ML (Global)	ADAPT-I2T	See all
FooDI-ML (Spain)	ADAPT-I2T	See all
Localized Narratives	OPT	See all
ICFG-PEDES	SSAN	See all
RUC-CAS-WenLan	CMCL	See all
ROxford Medium without fine-tuning	HesAff–rSIFT–VLAD	See all

Show all 54 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Image Retrieval models and implementations

huggingface/transformers

4 papers

126,353

OML-Team/open-metric-learning

4 papers

781

kornia/kornia

2 papers

9,474

salesforce/lavis

2 papers

8,890

See all 10 libraries.

Datasets

Subtasks

Medical Image Retrieval

Multi-Label Image Retrieval

Face Image Retrieval

Video-to-Shop

Image Instance Retrieval

Semi-Supervised Sketch Based Image Retrieval

Chat-based Image Retrieval

Most implemented papers

Most implemented Social Latest No code

Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

kornia/kornia • • 5 Oct 2019

This work presents Kornia -- an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems.

Paper
Code

Unifying Deep Local and Global Features for Image Search

tensorflow/models • • ECCV 2020

Image retrieval is the problem of searching an image database for items that are similar to a query image.

Paper
Code

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

cvdfoundation/google-landmark • • 3 Apr 2020

GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.

Paper
Code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

microsoft/Oscar • • ECCV 2020

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Paper
Code

SEMICON: A Learning-to-hash Solution for Large-scale Fine-grained Image Retrieval

njust-vipgroup/semicon • 28 Sep 2022

In this paper, we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks.

Paper
Code

Deep Image Retrieval: Learning global representations for image search

tensorflow/models • • 5 Apr 2016

We propose a novel approach for instance-level image retrieval.

Paper
Code

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

gtolias/mom • • CVPR 2017

The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches.

Paper
Code

Repeatability Is Not Enough: Learning Affine Regions via Discriminability

ducha-aiki/affnet • • ECCV 2018

A method for learning local affine-covariant regions is presented.

Paper
Code

Single Shot Scene Text Retrieval

lluisgomez/single-shot-str • • ECCV 2018

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Paper
Code

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

tensorflow/models • • CVPR 2019

Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods.

Paper
Code

Image Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result