TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Grounding	MAD	DenoiseLoc	R@1,IoU=0.1	11.59	# 1
Video Grounding	MAD	DenoiseLoc	R@5,IoU=0.1	30.35	# 1
Video Grounding	MAD	DenoiseLoc	R@10,IoU=0.1	41.44	# 1
Video Grounding	MAD	DenoiseLoc	R@50,IoU=0.1	66.07	# 1
Video Grounding	MAD	DenoiseLoc	R@100,IoU=0.1	73.62	# 1
Moment Retrieval	QVHighlights	DenoiseLoc	R@1 IoU=0.5	59.27	# 19
Moment Retrieval	QVHighlights	DenoiseLoc	R@1 IoU=0.7	45.07	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boundary-denoising-for-video-activity/video-grounding-on-mad)](https://paperswithcode.com/sota/video-grounding-on-mad?p=boundary-denoising-for-video-activity)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boundary-denoising-for-video-activity/moment-retrieval-on-qvhighlights)](https://paperswithcode.com/sota/moment-retrieval-on-qvhighlights?p=boundary-denoising-for-video-activity)`

Boundary-Denoising for Video Activity Localization

6 Apr 2023 · Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem ·

Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances %in several video activity understanding tasks. For example, we observe a gain of +12.36% average mAP on QV-Highlights dataset and +1.64% mAP@0.5 on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

PDF Abstract

Code

Add Remove Mark official

frostinassiky/denoiseloc official

Tasks

Add Remove

Action Detection

Decoder

Denoising

Moment Retrieval

Video Grounding

Datasets

QVHighlights

MAD

Results from the Paper

Edit

Ranked #1 on Video Grounding on MAD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Grounding	MAD	DenoiseLoc	R@1,IoU=0.1	11.59	# 1	Compare
			R@5,IoU=0.1	30.35	# 1	Compare
			R@10,IoU=0.1	41.44	# 1	Compare
			R@50,IoU=0.1	66.07	# 1	Compare
			R@100,IoU=0.1	73.62	# 1	Compare
Moment Retrieval	QVHighlights	DenoiseLoc	R@1 IoU=0.5	59.27	# 19	Compare
Moment Retrieval	QVHighlights	DenoiseLoc	R@1 IoU=0.7	45.07	# 16	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Boundary-Denoising for Video Activity Localization

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove