TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text to Audio Retrieval	AudioCaps	QB-Norm+CE	R@1	23.9	# 7
Text to Audio Retrieval	AudioCaps	QB-Norm+CE	R@10	71.6±0.4	# 6
Video Retrieval	DiDeMo	QB-Norm+CLIP4Clip	text-to-video R@1	43.5	# 32
Video Retrieval	DiDeMo	QB-Norm+CLIP4Clip	text-to-video R@5	71.4	# 29
Video Retrieval	DiDeMo	QB-Norm+CLIP4Clip	text-to-video R@10	80.9	# 28
Video Retrieval	DiDeMo	QB-Norm+CLIP4Clip	text-to-video Median Rank	2.0	# 9
Video Retrieval	LSMDC	QB-Norm+CLIP4Clip	text-to-video R@1	22.4	# 23
Video Retrieval	LSMDC	QB-Norm+CLIP4Clip	text-to-video R@5	40.1	# 20
Video Retrieval	LSMDC	QB-Norm+CLIP4Clip	text-to-video R@10	49.5	# 20
Video Retrieval	LSMDC	QB-Norm+CLIP4Clip	text-to-video Median Rank	11.0	# 10
Video Retrieval	MSR-VTT-1kA	QB-Norm+CLIP2Video	text-to-video R@1	47.2	# 27
Video Retrieval	MSR-VTT-1kA	QB-Norm+CLIP2Video	text-to-video R@5	73.0	# 26
Video Retrieval	MSR-VTT-1kA	QB-Norm+CLIP2Video	text-to-video R@10	83.0	# 25
Video Retrieval	MSR-VTT-1kA	QB-Norm+CLIP2Video	text-to-video Median Rank	2	# 10
Video Retrieval	MSVD	QB-Norm+CLIP2Video	text-to-video R@1	48.0	# 14
Video Retrieval	MSVD	QB-Norm+CLIP2Video	text-to-video R@5	77.9	# 12
Video Retrieval	MSVD	QB-Norm+CLIP2Video	text-to-video R@10	86.2	# 11
Video Retrieval	MSVD	QB-Norm+CLIP2Video	text-to-video Median Rank	2.0	# 8
Video Retrieval	QuerYD	QB-Norm+TT-CE+	text-to-video R@1	15.1	# 5
Metric Learning	Stanford Online Products	QB-Norm+RDML	R@1	78.1	# 30
Video Retrieval	VATEX	QB-Norm+CLIP2Video	text-to-video R@1	58.8	# 10
Video Retrieval	VATEX	QB-Norm+CLIP2Video	text-to-video R@10	93.8	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-queryd)](https://paperswithcode.com/sota/video-retrieval-on-queryd?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/text-to-audio-retrieval-on-audiocaps)](https://paperswithcode.com/sota/text-to-audio-retrieval-on-audiocaps?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-vatex)](https://paperswithcode.com/sota/video-retrieval-on-vatex?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-msvd)](https://paperswithcode.com/sota/video-retrieval-on-msvd?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-lsmdc)](https://paperswithcode.com/sota/video-retrieval-on-lsmdc?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-msr-vtt-1ka)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt-1ka?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/metric-learning-on-stanford-online-products-1)](https://paperswithcode.com/sota/metric-learning-on-stanford-online-products-1?p=cross-modal-retrieval-with-querybank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-retrieval-with-querybank/video-retrieval-on-didemo)](https://paperswithcode.com/sota/video-retrieval-on-didemo?p=cross-modal-retrieval-with-querybank)`

Cross Modal Retrieval with Querybank Normalisation

CVPR 2022 · Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie ·

Profiting from large-scale training datasets, advances in neural architecture design and efficient inference, joint embeddings have become the dominant approach for tackling cross-modal retrieval. In this work we first show that, despite their effectiveness, state-of-the-art joint embeddings suffer significantly from the longstanding "hubness problem" in which a small number of gallery embeddings form the nearest neighbours of many queries. Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. QB-Norm improves retrieval performance without requiring retraining. Differently from prior work, we show that QB-Norm works effectively without concurrent access to any test set queries. Within the QB-Norm framework, we also propose a novel similarity normalisation method, the Dynamic Inverted Softmax, that is significantly more robust than existing approaches. We showcase QB-Norm across a range of cross modal retrieval models and benchmarks where it consistently enhances strong baselines beyond the state of the art. Code is available at https://vladbogo.github.io/QB-Norm/.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

ioanacroi/qb-norm official

Tasks

Add Remove

Cross-Modal Retrieval

Metric Learning

Retrieval

Text to Audio Retrieval

Video Retrieval

Datasets

MSR-VTT

MSVD

Stanford Online Products

DiDeMo

AudioCaps

LSMDC

VATEX QuerYD

Results from the Paper

Edit

Ranked #5 on Video Retrieval on QuerYD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text to Audio Retrieval	AudioCaps	QB-Norm+CE	R@1	23.9	# 7	Compare
Text to Audio Retrieval	AudioCaps	QB-Norm+CE	R@10	71.6±0.4	# 6	Compare
Video Retrieval	DiDeMo	QB-Norm+CLIP4Clip	text-to-video R@1	43.5	# 32	Compare
			text-to-video R@5	71.4	# 29	Compare
			text-to-video R@10	80.9	# 28	Compare
			text-to-video Median Rank	2.0	# 9	Compare
Video Retrieval	LSMDC	QB-Norm+CLIP4Clip	text-to-video R@1	22.4	# 23	Compare
			text-to-video R@5	40.1	# 20	Compare
			text-to-video R@10	49.5	# 20	Compare
			text-to-video Median Rank	11.0	# 10	Compare
Video Retrieval	MSR-VTT-1kA	QB-Norm+CLIP2Video	text-to-video R@1	47.2	# 27	Compare
			text-to-video R@5	73.0	# 26	Compare
			text-to-video R@10	83.0	# 25	Compare
			text-to-video Median Rank	2	# 10	Compare
Video Retrieval	MSVD	QB-Norm+CLIP2Video	text-to-video R@1	48.0	# 14	Compare
			text-to-video R@5	77.9	# 12	Compare
			text-to-video R@10	86.2	# 11	Compare
			text-to-video Median Rank	2.0	# 8	Compare
Video Retrieval	QuerYD	QB-Norm+TT-CE+	text-to-video R@1	15.1	# 5	Compare
Metric Learning	Stanford Online Products	QB-Norm+RDML	R@1	78.1	# 30	Compare
Video Retrieval	VATEX	QB-Norm+CLIP2Video	text-to-video R@1	58.8	# 10	Compare
Video Retrieval	VATEX	QB-Norm+CLIP2Video	text-to-video R@10	93.8	# 7	Compare

Methods

Add Remove

Softmax

Edit Social Preview

Cross Modal Retrieval with Querybank Normalisation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove