TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Weakly Supervised Action Localization	ActivityNet-1.3	PivoTAL	mAP@0.5	45.1	# 1
Weakly Supervised Action Localization	ActivityNet-1.3	PivoTAL	mAP@0.5:0.95	28.1	# 1
Weakly Supervised Action Localization	THUMOS 2014	PivoTAL	mAP@0.5	42.8	# 3
Weakly Supervised Action Localization	THUMOS 2014	PivoTAL	mAP@0.1:0.7	49.6	# 3
Weakly Supervised Action Localization	THUMOS 2014	PivoTAL	mAP@0.1:0.5	60.1	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pivotal-prior-driven-supervision-for-weakly/weakly-supervised-action-localization-on-1)](https://paperswithcode.com/sota/weakly-supervised-action-localization-on-1?p=pivotal-prior-driven-supervision-for-weakly)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pivotal-prior-driven-supervision-for-weakly/weakly-supervised-action-localization-on)](https://paperswithcode.com/sota/weakly-supervised-action-localization-on?p=pivotal-prior-driven-supervision-for-weakly)`

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

CVPR 2023 · Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen ·

Weakly-supervised Temporal Action Localization (WTAL) attempts to localize the actions in untrimmed videos using only video-level supervision. Most recent works approach WTAL from a localization-by-classification perspective where these methods try to classify each video frame followed by a manually-designed post-processing pipeline to aggregate these per-frame action predictions into action snippets. Due to this perspective, the model lacks any explicit understanding of action boundaries and tends to focus only on the most discriminative parts of the video resulting in incomplete action localization. To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly. To this end, PivoTAL leverages the underlying spatio-temporal regularities in videos in the form of action-specific scene prior, action snippet generation prior, and learnable Gaussian prior to supervise the localization-based training. PivoTAL shows significant improvement (of at least 3% avg mAP) over all existing methods on the benchmark datasets, THUMOS-14 and ActivitNet-v1.3.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Localization

Temporal Action Localization

Weakly Supervised Action Localization

Weakly Supervised Temporal Action Localization

Datasets

ActivityNet

THUMOS14

Results from the Paper

Add Remove

Ranked #1 on Weakly Supervised Action Localization on ActivityNet-1.3

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Weakly Supervised Action Localization	ActivityNet-1.3	PivoTAL	mAP@0.5	45.1	# 1	Compare
Weakly Supervised Action Localization	ActivityNet-1.3	PivoTAL	mAP@0.5:0.95	28.1	# 1	Compare
Weakly Supervised Action Localization	THUMOS 2014	PivoTAL	mAP@0.5	42.8	# 3	Compare
			mAP@0.1:0.7	49.6	# 3	Compare
			mAP@0.1:0.5	60.1	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove