TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.5	0.557	# 20
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.9	0.02	# 23
Referring Expression Segmentation	A2D Sentences	ACGA	IoU overall	0.601	# 21
Referring Expression Segmentation	A2D Sentences	ACGA	IoU mean	0.490	# 21
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.6	0.459	# 20
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.7	0.319	# 22
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.8	0.16	# 22
Referring Expression Segmentation	A2D Sentences	ACGA	AP	0.274	# 18
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.5	0.756	# 13
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.6	0.564	# 16
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.7	0.287	# 16
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.8	0.034	# 18
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.9	0.000	# 11
Referring Expression Segmentation	J-HMDB	ACGA	AP	0.289	# 12
Referring Expression Segmentation	J-HMDB	ACGA	IoU overall	0.576	# 14
Referring Expression Segmentation	J-HMDB	ACGA	IoU mean	0.584	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asymmetric-cross-guided-attention-network-for/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=asymmetric-cross-guided-attention-network-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asymmetric-cross-guided-attention-network-for/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=asymmetric-cross-guided-attention-network-for)`

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

ICCV 2019 · Hao Wang, Cheng Deng, Junchi Yan, Dacheng Tao ·

Actor and action video segmentation from natural language query aims to selectively segment the actor and its action in a video based on an input textual description. Previous works mostly focus on learning simple correlation between two heterogeneous features of vision and language via dynamic convolution or fully convolutional classification. However, they ignore the linguistic variation of natural language query and have difficulty in modeling global visual context, which leads to unsatisfactory segmentation performance. To address these issues, we propose an asymmetric cross-guided attention network for actor and action video segmentation from natural language query. Specifically, we frame an asymmetric cross-guided attention network, which consists of vision guided language attention to reduce the linguistic variation of input query and language guided vision attention to incorporate query-focused global visual context simultaneously. Moreover, we adopt multi-resolution fusion scheme and weighted loss for foreground and background pixels to obtain further performance improvement. Extensive experiments on Actor-Action Dataset Sentences and J-HMDB Sentences show that our proposed approach notably outperforms state-of-the-art methods.

PDF Abstract

Code

Add Remove Mark official

haowang1992/ACGA official

Tasks

Add Remove

Referring Expression Segmentation

Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

JHMDB

A2D

A2D Sentences

Results from the Paper

Add Remove

Ranked #12 on Referring Expression Segmentation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	J-HMDB	ACGA	Precision@0.5	0.756	# 13	Compare
			Precision@0.6	0.564	# 16	Compare
			Precision@0.7	0.287	# 16	Compare
			Precision@0.8	0.034	# 18	Compare
			Precision@0.9	0.000	# 11	Compare
			AP	0.289	# 12	Compare
			IoU overall	0.576	# 14	Compare
			IoU mean	0.584	# 11	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Referring Expression Segmentation	A2D Sentences	ACGA	Precision@0.5	0.557	# 20	See all
			Precision@0.9	0.02	# 23	See all
			IoU overall	0.601	# 21	See all
			IoU mean	0.490	# 21	See all
			Precision@0.6	0.459	# 20	See all
			Precision@0.7	0.319	# 22	See all
			Precision@0.8	0.16	# 22	See all
			AP	0.274	# 18	See all

Methods

Add Remove

Convolution

Edit Social Preview

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove