Video-Text Retrieval

47 papers with code • 1 benchmarks • 5 datasets

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video-Text Retrieval

Trend	Dataset	Best Model	Paper	Code	Compare
	Test-of-Time	TACT			See all

Libraries

Use these libraries to find Video-Text Retrieval models and implementations

towhee-io/towhee

3 papers

3,037

Datasets

Most implemented papers

Most implemented Social Latest No code

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

farewellthree/stan • • CVPR 2023

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Paper
Code

Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

yimuwangcs/Better_Cross_Modal_Retrieval • • 19 Feb 2023

While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval.

Paper
Code

CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

FangyunWei/SLRT • • CVPR 2023

Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e. g., +22. 4 T2V and +28. 0 V2T R@1 improvements on How2Sign dataset, and +13. 7 T2V and +17. 1 V2T R@1 improvements on PHOENIX-2014T dataset.

Paper
Code

SViTT: Temporal Learning of Sparse Video-Text Transformers

jerryyli/svitt • • CVPR 2023

Do video-text transformers learn to model temporal relationships across frames?

Paper
Code

Global and Local Semantic Completion Learning for Vision-Language Pre-training

iigroup/scl • • 12 Jun 2023

MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.

Paper
Code

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

chuhanxx/helping_hand_for_egocentric_videos • • ICCV 2023

We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i. e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e. g. Episodic Memory in Ego4D).

Paper
Code

Multi-event Video-Text Retrieval

gengyuanmax/mevtr • • ICCV 2023

In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task.

Paper
Code

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Paranioar/UniPT • • 28 Aug 2023

Parameter-efficient transfer learning (PETL), i. e., fine-tuning a small portion of parameters, is an effective strategy for adapting pre-trained models to downstream domains.

Paper
Code

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

ziyang412/ucofia • • ICCV 2023

Specifically, our model captures the cross-modal similarity information at different granularity levels.

Paper
Code

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

leolee99/pau • • NeurIPS 2023

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Paper
Code

Video-Text Retrieval

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result