TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	F1@10%	82.1	# 1
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	F1@50%	67.5	# 1
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	Acc	78.0	# 1
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	Edit	78.3	# 4
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	F1@25%	79.0	# 1
Long-video Activity Recognition	Breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, GHRM)	mAP	69.6	# 4
Long-video Activity Recognition	Breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, Timeception)	mAP	79.2	# 2
Long-video Activity Recognition	Breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, Timeception)	mAP	70.4	# 3
Long-video Activity Recognition	Breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, GHRM)	mAP	79.5	# 1
Weakly Supervised Action Segmentation (Action Set))	Breakfast	AdaFocus (newly extracted I3D-features, POC model)	Acc	49.6	# 1
Action Classification	Charades	AdaFocus (weak supervision, MViT-B-24, 32x3)	MAP	47.8	# 13
Action Classification	Charades	AdaFocus (weak supervision, Slowfast-R50, 16x8)	MAP	39.3	# 35
Action Classification	Charades	AdaFocus (weak supervision, X3D-L, 32x3)	MAP	41.2	# 29
Action Classification	Charades	AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)	MAP	41.4	# 28
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	R1@0.5	56.7	# 2
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	R1@0.7	35.6	# 2
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	R5@0.7	65.0	# 2
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	R5@0.5	87.9	# 3
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	R1@0.5	49.1	# 7
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	R1@0.7	22.4	# 6
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	R5@0.7	51.8	# 7
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	R5@0.5	84.2	# 8
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	R1@0.5	51.7	# 4
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	R1@0.7	23.2	# 5
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	R5@0.7	52.6	# 6
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	R5@0.5	85.2	# 6
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	R1@0.5	62.4	# 1
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	R1@0.7	38.6	# 1
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	R5@0.7	66.4	# 1
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	R5@0.5	89.4	# 1
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	R1@0.5	50.1	# 5
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	R1@0.7	21.8	# 7
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	R5@0.7	54.6	# 5
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	R5@0.5	86.1	# 4
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	R1@0.5	46.9	# 9
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	R1@0.7	21.1	# 9
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	R5@0.7	49.2	# 10
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	R5@0.5	79.3	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adafocus-towards-end-to-end-weakly-supervised/action-segmentation-on-breakfast-1)](https://paperswithcode.com/sota/action-segmentation-on-breakfast-1?p=adafocus-towards-end-to-end-weakly-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adafocus-towards-end-to-end-weakly-supervised/long-video-activity-recognition-on-breakfast)](https://paperswithcode.com/sota/long-video-activity-recognition-on-breakfast?p=adafocus-towards-end-to-end-weakly-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adafocus-towards-end-to-end-weakly-supervised/weakly-supervised-action-segmentation-action)](https://paperswithcode.com/sota/weakly-supervised-action-segmentation-action?p=adafocus-towards-end-to-end-weakly-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adafocus-towards-end-to-end-weakly-supervised/temporal-sentence-grounding-on-charades-sta)](https://paperswithcode.com/sota/temporal-sentence-grounding-on-charades-sta?p=adafocus-towards-end-to-end-weakly-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adafocus-towards-end-to-end-weakly-supervised/action-classification-on-charades)](https://paperswithcode.com/sota/action-classification-on-charades?p=adafocus-towards-end-to-end-weakly-supervised)`

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding

28 Nov 2023 · Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang ·

Developing end-to-end models for long-video action understanding tasks presents significant computational and memory challenges. Existing works generally build models on long-video features extracted by off-the-shelf action recognition models, which are trained on short-video datasets in different domains, making the extracted features suffer domain discrepancy. To avoid this, action recognition models can be end-to-end trained on clips, which are trimmed from long videos and labeled using action interval annotations. Such fully supervised annotations are expensive to collect. Thus, a weakly supervised method is needed for long-video action understanding at scale. Under the weak supervision setting, action labels are provided for the whole video without precise start and end times of the action clip. To this end, we propose an AdaFocus framework. AdaFocus estimates the spike-actionness and temporal positions of actions, enabling it to adaptively focus on action clips that facilitate better training without the need for precise annotations. Experiments on three long-video datasets show its effectiveness. Remarkably, on two of datasets, models trained with AdaFocus under weak supervision outperform those trained under full supervision. Furthermore, we form a weakly supervised feature extraction pipeline with our AdaFocus, which enables significant improvements on three long-video action understanding tasks.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Classification

Action Recognition

Action Segmentation

Action Understanding

Long-video Activity Recognition

Temporal Sentence Grounding

Weakly Supervised Action Segmentation (Action Set))

Weakly-supervised Learning

Datasets

Charades

Charades-STA

Breakfast

MultiTHUMOS

Results from the Paper

Edit

Ranked #1 on Long-video Activity Recognition on Breakfast

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Segmentation	Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	F1@10%	82.1	# 1	Compare
			F1@50%	67.5	# 1	Compare
			Acc	78.0	# 1	Compare
			Edit	78.3	# 4	Compare
			F1@25%	79.0	# 1	Compare
Long-video Activity Recognition	Breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, GHRM)	mAP	69.6	# 4	Compare
Long-video Activity Recognition	Breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, Timeception)	mAP	79.2	# 2	Compare
Long-video Activity Recognition	Breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, Timeception)	mAP	70.4	# 3	Compare
Long-video Activity Recognition	Breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, GHRM)	mAP	79.5	# 1	Compare
Weakly Supervised Action Segmentation (Action Set))	Breakfast	AdaFocus (newly extracted I3D-features, POC model)	Acc	49.6	# 1	Compare
Action Classification	Charades	AdaFocus (weak supervision, MViT-B-24, 32x3)	MAP	47.8	# 13	Compare
Action Classification	Charades	AdaFocus (weak supervision, Slowfast-R50, 16x8)	MAP	39.3	# 35	Compare
Action Classification	Charades	AdaFocus (weak supervision, X3D-L, 32x3)	MAP	41.2	# 29	Compare
Action Classification	Charades	AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)	MAP	41.4	# 28	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	R1@0.5	56.7	# 2	Compare
			R1@0.7	35.6	# 2	Compare
			R5@0.7	65.0	# 2	Compare
			R5@0.5	87.9	# 3	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	R1@0.5	49.1	# 7	Compare
			R1@0.7	22.4	# 6	Compare
			R5@0.7	51.8	# 7	Compare
			R5@0.5	84.2	# 8	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	R1@0.5	51.7	# 4	Compare
			R1@0.7	23.2	# 5	Compare
			R5@0.7	52.6	# 6	Compare
			R5@0.5	85.2	# 6	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	R1@0.5	62.4	# 1	Compare
			R1@0.7	38.6	# 1	Compare
			R5@0.7	66.4	# 1	Compare
			R5@0.5	89.4	# 1	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	R1@0.5	50.1	# 5	Compare
			R1@0.7	21.8	# 7	Compare
			R5@0.7	54.6	# 5	Compare
			R5@0.5	86.1	# 4	Compare
Temporal Sentence Grounding	Charades-STA	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	R1@0.5	46.9	# 9	Compare
			R1@0.7	21.1	# 9	Compare
			R5@0.7	49.2	# 10	Compare
			R5@0.5	79.3	# 11	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove