TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Retrieval	LSMDC	CLIP	text-to-video R@1	11.3	# 31
Video Retrieval	LSMDC	CLIP	text-to-video R@5	22.7	# 29
Video Retrieval	LSMDC	CLIP	text-to-video R@10	29.2	# 29
Video Retrieval	LSMDC	CLIP	text-to-video Median Rank	56.5	# 22
Video Retrieval	LSMDC	CLIP	video-to-text R@1	6.8	# 14
Video Retrieval	LSMDC	CLIP	video-to-text R@5	16.4	# 11
Video Retrieval	LSMDC	CLIP	video-to-text R@10	22.1	# 10
Video Retrieval	LSMDC	CLIP	video-to-text Median Rank	73	# 6
Video Retrieval	MSR-VTT	CLIP	text-to-video R@1	21.4	# 31
Video Retrieval	MSR-VTT	CLIP	text-to-video R@5	41.1	# 29
Video Retrieval	MSR-VTT	CLIP	text-to-video R@10	50.4	# 29
Video Retrieval	MSR-VTT	CLIP	text-to-video Median Rank	10	# 13
Video Retrieval	MSR-VTT	CLIP	video-to-text R@1	40.3	# 9
Video Retrieval	MSR-VTT	CLIP	video-to-text R@5	69.7	# 7
Video Retrieval	MSR-VTT	CLIP	video-to-text R@10	79.2	# 7
Video Retrieval	MSR-VTT	CLIP	video-to-text Median Rank	2	# 3
Video Retrieval	MSR-VTT-1kA	CLIP	text-to-video R@1	31.2	# 43
Video Retrieval	MSR-VTT-1kA	CLIP	text-to-video R@5	53.7	# 49
Video Retrieval	MSR-VTT-1kA	CLIP	text-to-video R@10	64.2	# 52
Video Retrieval	MSR-VTT-1kA	CLIP	text-to-video Median Rank	4	# 28
Video Retrieval	MSR-VTT-1kA	CLIP	video-to-text R@1	27.2	# 23
Video Retrieval	MSR-VTT-1kA	CLIP	video-to-text R@5	51.7	# 21
Video Retrieval	MSR-VTT-1kA	CLIP	video-to-text R@10	62.6	# 22
Video Retrieval	MSR-VTT-1kA	CLIP	video-to-text Median Rank	5	# 18
Video Retrieval	MSVD	CLIP	text-to-video R@1	37	# 21
Video Retrieval	MSVD	CLIP	text-to-video R@5	64.1	# 20
Video Retrieval	MSVD	CLIP	text-to-video R@10	73.8	# 19
Video Retrieval	MSVD	CLIP	text-to-video Median Rank	3.0	# 14
Video Retrieval	MSVD	CLIP	video-to-text R@1	59.9	# 15
Video Retrieval	MSVD	CLIP	video-to-text R@5	85.2	# 12
Video Retrieval	MSVD	CLIP	video-to-text R@10	90.7	# 12
Video Retrieval	MSVD	CLIP	video-to-text Median Rank	1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-straightforward-framework-for-video/video-retrieval-on-msvd)](https://paperswithcode.com/sota/video-retrieval-on-msvd?p=a-straightforward-framework-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-straightforward-framework-for-video/video-retrieval-on-lsmdc)](https://paperswithcode.com/sota/video-retrieval-on-lsmdc?p=a-straightforward-framework-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-straightforward-framework-for-video/video-retrieval-on-msr-vtt)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt?p=a-straightforward-framework-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-straightforward-framework-for-video/video-retrieval-on-msr-vtt-1ka)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt-1ka?p=a-straightforward-framework-for-video)`

A Straightforward Framework For Video Retrieval Using CLIP

24 Feb 2021 · Jesús Andrés Portillo-Quintero, José Carlos Ortiz-Bayliss, Hugo Terashima-Marín ·

Video Retrieval is a challenging task where a text query is matched to a video or vice versa. Most of the existing approaches for addressing such a problem rely on annotations made by the users. Although simple, this approach is not always feasible in practice. In this work, we explore the application of the language-image model, CLIP, to obtain video representations without the need for said annotations. This model was explicitly trained to learn a common space where images and text can be compared. Using various techniques described in this document, we extended its application to videos, obtaining state-of-the-art results on the MSR-VTT and MSVD benchmarks.

PDF Abstract

Code

Add Remove Mark official

Deferf/CLIP_Video_Representation official

Tasks

Add Remove

Retrieval

Video Retrieval

Datasets

MSR-VTT

MSVD

LSMDC

Results from the Paper

Edit

Ranked #21 on Video Retrieval on MSVD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Retrieval	LSMDC	CLIP	text-to-video R@1	11.3	# 31	Compare
			text-to-video R@5	22.7	# 29	Compare
			text-to-video R@10	29.2	# 29	Compare
			text-to-video Median Rank	56.5	# 22	Compare
			video-to-text R@1	6.8	# 14	Compare
			video-to-text R@5	16.4	# 11	Compare
			video-to-text R@10	22.1	# 10	Compare
			video-to-text Median Rank	73	# 6	Compare
Video Retrieval	MSR-VTT	CLIP	text-to-video R@1	21.4	# 31	Compare
			text-to-video R@5	41.1	# 29	Compare
			text-to-video R@10	50.4	# 29	Compare
			text-to-video Median Rank	10	# 13	Compare
			video-to-text R@1	40.3	# 9	Compare
			video-to-text R@5	69.7	# 7	Compare
			video-to-text R@10	79.2	# 7	Compare
			video-to-text Median Rank	2	# 3	Compare
Video Retrieval	MSR-VTT-1kA	CLIP	text-to-video R@1	31.2	# 43	Compare
			text-to-video R@5	53.7	# 49	Compare
			text-to-video R@10	64.2	# 52	Compare
			text-to-video Median Rank	4	# 28	Compare
			video-to-text R@1	27.2	# 23	Compare
			video-to-text R@5	51.7	# 21	Compare
			video-to-text R@10	62.6	# 22	Compare
			video-to-text Median Rank	5	# 18	Compare
Video Retrieval	MSVD	CLIP	text-to-video R@1	37	# 21	Compare
			text-to-video R@5	64.1	# 20	Compare
			text-to-video R@10	73.8	# 19	Compare
			text-to-video Median Rank	3.0	# 14	Compare
			video-to-text R@1	59.9	# 15	Compare
			video-to-text R@5	85.2	# 12	Compare
			video-to-text R@10	90.7	# 12	Compare
			video-to-text Median Rank	1	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Straightforward Framework For Video Retrieval Using CLIP

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove