TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Moment Retrieval	Charades-STA	UVCOM	R@1 IoU=0.5	59.25	# 7
Moment Retrieval	Charades-STA	UVCOM	R@1 IoU=0.7	36.64	# 7
Moment Retrieval	QVHighlights	UVCOM	mAP	43.18	# 9
Moment Retrieval	QVHighlights	UVCOM	R@1 IoU=0.5	63.55	# 12
Moment Retrieval	QVHighlights	UVCOM	R@1 IoU=0.7	47.47	# 11
Moment Retrieval	QVHighlights	UVCOM	mAP@0.5	63.37	# 12
Moment Retrieval	QVHighlights	UVCOM	mAP@0.75	42.67	# 10
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	mAP	43.8	# 7
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	R@1 IoU=0.5	64.53	# 7
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	R@1 IoU=0.7	48.31	# 8
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	mAP@0.5	64.78	# 5
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	mAP@0.75	43.65	# 8
Natural Language Moment Retrieval	TACoS	UVCOM	R@1,IoU=0.5	36.39	# 3
Natural Language Moment Retrieval	TACoS	UVCOM	R@1,IoU=0.7	23.32	# 2
Highlight Detection	TvSum	UVCOM (train from scratch)	mAP	86.3	# 3
Highlight Detection	YouTube Highlights	UVCOM	mAP	77.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-gap-a-unified-video/highlight-detection-on-youtube-highlights)](https://paperswithcode.com/sota/highlight-detection-on-youtube-highlights?p=bridging-the-gap-a-unified-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-gap-a-unified-video/natural-language-moment-retrieval-on-tacos)](https://paperswithcode.com/sota/natural-language-moment-retrieval-on-tacos?p=bridging-the-gap-a-unified-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-gap-a-unified-video/highlight-detection-on-tvsum)](https://paperswithcode.com/sota/highlight-detection-on-tvsum?p=bridging-the-gap-a-unified-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-gap-a-unified-video/moment-retrieval-on-charades-sta)](https://paperswithcode.com/sota/moment-retrieval-on-charades-sta?p=bridging-the-gap-a-unified-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-gap-a-unified-video/moment-retrieval-on-qvhighlights)](https://paperswithcode.com/sota/moment-retrieval-on-qvhighlights?p=bridging-the-gap-a-unified-video)`

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

28 Nov 2023 · Yicheng Xiao, Zhuoyan Luo, Yong liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li ·

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis. Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture. However, we observe that the emphasis of MR and HD differs, with one necessitating the perception of local relationships and the other prioritizing the understanding of global contexts. Consequently, the lack of task-specific design will inevitably lead to limitations in associating the intrinsic specialty of two tasks. To tackle the issue, we propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively. By performing progressive integration on intra and inter-modality across multi-granularity, UVCOM achieves the comprehensive understanding in processing a video. Moreover, we present multi-aspect contrastive learning to consolidate the local relation modeling and global knowledge accumulation via well aligned multi-modal space. Extensive experiments on QVHighlights, Charades-STA, TACoS , YouTube Highlights and TVSum datasets demonstrate the effectiveness and rationality of UVCOM which outperforms the state-of-the-art methods by a remarkable margin.

PDF Abstract

Code

Add Remove Mark official

easonxiao-888/uvcom official

Tasks

Add Remove

Contrastive Learning

Highlight Detection

Moment Retrieval

Natural Language Moment Retrieval

Retrieval

Temporal Action Localization

Video Grounding

Datasets

Charades-STA TVSum

QVHighlights

Results from the Paper

Edit

Ranked #1 on Highlight Detection on YouTube Highlights

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Moment Retrieval	Charades-STA	UVCOM	R@1 IoU=0.5	59.25	# 7	Compare
Moment Retrieval	Charades-STA	UVCOM	R@1 IoU=0.7	36.64	# 7	Compare
Moment Retrieval	QVHighlights	UVCOM	mAP	43.18	# 9	Compare
			R@1 IoU=0.5	63.55	# 12	Compare
			R@1 IoU=0.7	47.47	# 11	Compare
			mAP@0.5	63.37	# 12	Compare
			mAP@0.75	42.67	# 10	Compare
Moment Retrieval	QVHighlights	UVCOM (w/ PT ASR Captions)	mAP	43.8	# 7	Compare
			R@1 IoU=0.5	64.53	# 7	Compare
			R@1 IoU=0.7	48.31	# 8	Compare
			mAP@0.5	64.78	# 5	Compare
			mAP@0.75	43.65	# 8	Compare
Natural Language Moment Retrieval	TACoS	UVCOM	R@1,IoU=0.5	36.39	# 3	Compare
Natural Language Moment Retrieval	TACoS	UVCOM	R@1,IoU=0.7	23.32	# 2	Compare
Highlight Detection	TvSum	UVCOM (train from scratch)	mAP	86.3	# 3	Compare
Highlight Detection	YouTube Highlights	UVCOM	mAP	77.4	# 1	Compare

Methods

Add Remove

Contrastive Learning

Edit Social Preview

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove