TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Highlight Detection	QVHighlights	UniVTG (w/ PT)	mAP	40.54	# 4
Highlight Detection	QVHighlights	UniVTG (w/ PT)	Hit@1	66.28	# 3
Highlight Detection	QVHighlights	UniVTG	mAP	38.20	# 10
Highlight Detection	QVHighlights	UniVTG	Hit@1	60.96	# 10
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	mAP	43.63	# 8
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	R@1 IoU=0.5	65.43	# 5
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	R@1 IoU=0.7	50.06	# 3
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	mAP@0.5	64.06	# 10
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	mAP@0.75	45.02	# 7
Moment Retrieval	QVHighlights	UniVTG	mAP	35.47	# 19
Moment Retrieval	QVHighlights	UniVTG	R@1 IoU=0.5	58.86	# 20
Moment Retrieval	QVHighlights	UniVTG	R@1 IoU=0.7	40.86	# 19
Moment Retrieval	QVHighlights	UniVTG	mAP@0.5	57.60	# 17
Moment Retrieval	QVHighlights	UniVTG	mAP@0.75	35.59	# 16
Natural Language Moment Retrieval	TACoS	UniVTG	R@1,IoU=0.3	51.44	# 3
Natural Language Moment Retrieval	TACoS	UniVTG	R@1,IoU=0.5	34.97	# 5
Natural Language Moment Retrieval	TACoS	UniVTG	R@1,IoU=0.7	21.07	# 4
Natural Language Moment Retrieval	TACoS	UniVTG	mIoU	35.76	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univtg-towards-unified-video-language/natural-language-moment-retrieval-on-tacos)](https://paperswithcode.com/sota/natural-language-moment-retrieval-on-tacos?p=univtg-towards-unified-video-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univtg-towards-unified-video-language/highlight-detection-on-qvhighlights)](https://paperswithcode.com/sota/highlight-detection-on-qvhighlights?p=univtg-towards-unified-video-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univtg-towards-unified-video-language/moment-retrieval-on-qvhighlights)](https://paperswithcode.com/sota/moment-retrieval-on-qvhighlights?p=univtg-towards-unified-video-language)`

UniVTG: Towards Unified Video-Language Temporal Grounding

ICCV 2023 · Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou ·

Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language queries (e.g., sentences or words), is key for video browsing on social media. Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels. In this paper, we propose to Unify the diverse VTG labels and tasks, dubbed UniVTG, along three directions: Firstly, we revisit a wide range of VTG labels and tasks and define a unified formulation. Based on this, we develop data annotation schemes to create scalable pseudo supervision. Secondly, we develop an effective and flexible grounding model capable of addressing each task and making full use of each label. Lastly, thanks to the unified framework, we are able to unlock temporal grounding pretraining from large-scale diverse labels and develop stronger grounding abilities e.g., zero-shot grounding. Extensive experiments on three tasks (moment retrieval, highlight detection and video summarization) across seven datasets (QVHighlights, Charades-STA, TACoS, Ego4D, YouTube Highlights, TVSum, and QFVS) demonstrate the effectiveness and flexibility of our proposed framework. The codes are available at https://github.com/showlab/UniVTG.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

showlab/univtg official

↳ Quickstart in

Spaces

282

Tasks

Add Remove

Highlight Detection

Moment Retrieval

Natural Language Moment Retrieval

Retrieval

Video Summarization

Datasets

Charades-STA TVSum

QVHighlights

Ego4D TACoS Multi-Level Corpus

Results from the Paper

Edit

Ranked #3 on Natural Language Moment Retrieval on TACoS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Highlight Detection	QVHighlights	UniVTG (w/ PT)	mAP	40.54	# 4	Compare
Highlight Detection	QVHighlights	UniVTG (w/ PT)	Hit@1	66.28	# 3	Compare
Highlight Detection	QVHighlights	UniVTG	mAP	38.20	# 10	Compare
Highlight Detection	QVHighlights	UniVTG	Hit@1	60.96	# 10	Compare
Moment Retrieval	QVHighlights	UniVTG (w/ PT)	mAP	43.63	# 8	Compare
			R@1 IoU=0.5	65.43	# 5	Compare
			R@1 IoU=0.7	50.06	# 3	Compare
			mAP@0.5	64.06	# 10	Compare
			mAP@0.75	45.02	# 7	Compare
Moment Retrieval	QVHighlights	UniVTG	mAP	35.47	# 19	Compare
			R@1 IoU=0.5	58.86	# 20	Compare
			R@1 IoU=0.7	40.86	# 19	Compare
			mAP@0.5	57.60	# 17	Compare
			mAP@0.75	35.59	# 16	Compare
Natural Language Moment Retrieval	TACoS	UniVTG	R@1,IoU=0.3	51.44	# 3	Compare
			R@1,IoU=0.5	34.97	# 5	Compare
			R@1,IoU=0.7	21.07	# 4	Compare
			mIoU	35.76	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

UniVTG: Towards Unified Video-Language Temporal Grounding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove