Video Retrieval

221 papers with code • 18 benchmarks • 31 datasets

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Libraries

Use these libraries to find Video Retrieval models and implementations
5 papers
2,991
2 papers
29,251
See all 5 libraries.

Latest papers with no code

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code yet • 22 Apr 2024

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

ProTA: Probabilistic Token Aggregation for Text-Video Retrieval

no code yet • 18 Apr 2024

Text-video retrieval aims to find the most relevant cross-modal samples for a given query.

Event-aware Video Corpus Moment Retrieval

no code yet • 21 Feb 2024

Video Corpus Moment Retrieval (VCMR) is a practical video retrieval task focused on identifying a specific moment within a vast corpus of untrimmed videos using the natural language query.

Video Editing for Video Retrieval

no code yet • 4 Feb 2024

The teacher model is employed to edit the clips in the training set whereas the student model trains on the edited clips.

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code yet • 22 Jan 2024

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

Distilling Vision-Language Models on Millions of Videos

no code yet • 11 Jan 2024

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks

no code yet • 6 Jan 2024

Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.

Detours for Navigating Instructional Videos

no code yet • 3 Jan 2024

We introduce the video detours problem for navigating instructional videos.

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

no code yet • 1 Jan 2024

To address this issue, we adopt multi-granularity visual feature learning, ensuring the model's comprehensiveness in capturing visual content features spanning from abstract to detailed levels during the training phase.

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code yet • 20 Dec 2023

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.