TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Weakly Supervised Action Localization	ActivityNet-1.2	ASL	mAP@0.5	40.2	# 8
Weakly Supervised Action Localization	ActivityNet-1.2	ASL	Mean mAP	25.8	# 7
Weakly Supervised Action Localization	FineAction	ASL	mAP	3.30	# 4
Weakly Supervised Action Localization	FineAction	ASL	mAP IOU@0.5	2.68	# 4
Weakly Supervised Action Localization	FineAction	ASL	mAP IOU@0.75	0.81	# 4
Weakly Supervised Action Localization	FineAction	ASL	mAP IOU@0.95	3.30	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weakly-supervised-action-selection-learning/weakly-supervised-action-localization-on-7)](https://paperswithcode.com/sota/weakly-supervised-action-localization-on-7?p=weakly-supervised-action-selection-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/weakly-supervised-action-selection-learning/weakly-supervised-action-localization-on-2)](https://paperswithcode.com/sota/weakly-supervised-action-localization-on-2?p=weakly-supervised-action-selection-learning)`

Weakly Supervised Action Selection Learning in Video

CVPR 2021 · Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Guangwei Yu ·

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3% and 5.7% relative improvement respectively. We further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here: https://github.com/layer6ai-labs/ASL.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Code

Add Remove Mark official

layer6ai-labs/ASL official

Tasks

Add Remove

Temporal Localization

Weakly Supervised Action Localization

Datasets

ActivityNet

THUMOS14

FineAction

Results from the Paper

Add Remove

Ranked #4 on Weakly Supervised Action Localization on FineAction

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Weakly Supervised Action Localization	ActivityNet-1.2	ASL	mAP@0.5	40.2	# 8	Compare
Weakly Supervised Action Localization	ActivityNet-1.2	ASL	Mean mAP	25.8	# 7	Compare
Weakly Supervised Action Localization	FineAction	ASL	mAP	3.30	# 4	Compare
			mAP IOU@0.5	2.68	# 4	Compare
			mAP IOU@0.75	0.81	# 4	Compare
			mAP IOU@0.95	3.30	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Weakly Supervised Action Selection Learning in Video

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove