Video Retrieval

221 papers with code • 18 benchmarks • 31 datasets

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Retrieval

Dataset	Best Model	Compare
MSR-VTT-1kA	HunYuan_tvr (huge)	See all
LSMDC	InternVideo2-6B	See all
MSR-VTT	VAST	See all
DiDeMo	InternVideo2-6B	See all
ActivityNet	InternVideo2-6B	See all
MSVD	InternVideo2-6B	See all
FIVR-200K	S2VS	See all
YouCook2	VAST	See all
VATEX	VAST	See all
QuerYD	QB-Norm+TT-CE+	See all
SSv2-label retrieval	UMT-L (ViT-L/16)	See all
SSv2-template retrieval	UMT-L (ViT-L/16)	See all
Condensed Movies	TESTA (ViT-B/16)	See all
TVR	Hero w/ pre-training	See all
TGIF	MDMMT-2	See all
RUDDER	PO Loss	See all
Charades-STA	PO Loss	See all
MSVD-Indonesian	X-CLIP (Cross-Lingual)	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Retrieval models and implementations

towhee-io/towhee

5 papers

2,991

jpthu17/diffusionret

4 papers

albanie/collaborative-experts

3 papers

327

pytorch/fairseq

2 papers

29,251

See all 5 libraries.

Datasets

Subtasks

Replay Grounding

Composed Video Retrieval (CoVR)

Latest papers with no code

Most implemented Social Latest No code

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code yet • 22 Apr 2024

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

Paper
Add Code

ProTA: Probabilistic Token Aggregation for Text-Video Retrieval

no code yet • 18 Apr 2024

Text-video retrieval aims to find the most relevant cross-modal samples for a given query.

Paper
Add Code

Event-aware Video Corpus Moment Retrieval

no code yet • 21 Feb 2024

Video Corpus Moment Retrieval (VCMR) is a practical video retrieval task focused on identifying a specific moment within a vast corpus of untrimmed videos using the natural language query.

Paper
Add Code

Video Editing for Video Retrieval

no code yet • 4 Feb 2024

The teacher model is employed to edit the clips in the training set whereas the student model trains on the edited clips.

Paper
Add Code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code yet • 22 Jan 2024

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

Paper
Add Code

Distilling Vision-Language Models on Millions of Videos

no code yet • 11 Jan 2024

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Paper
Add Code

Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks

no code yet • 6 Jan 2024

Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.

Paper
Add Code

Detours for Navigating Instructional Videos

no code yet • 3 Jan 2024

We introduce the video detours problem for navigating instructional videos.

Paper
Add Code

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

no code yet • 1 Jan 2024

To address this issue, we adopt multi-granularity visual feature learning, ensuring the model's comprehensiveness in capturing visual content features spanning from abstract to detailed levels during the training phase.

Paper
Add Code

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code yet • 20 Dec 2023

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

Paper
Add Code

Video Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result