TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.5	0.704	# 9
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.9	0.171	# 5
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	IoU overall	0.644	# 17
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	IoU mean	0.655	# 5
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.6	0.677	# 8
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.7	0.617	# 6
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.8	0.489	# 5
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.5	0.880	# 6
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.6	0.796	# 6
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.7	0.566	# 7
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.8	0.147	# 7
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.9	0.002	# 4
Referring Expression Segmentation	J-HMDB	ClawCraneNet	IoU overall	0.644	# 8
Referring Expression Segmentation	J-HMDB	ClawCraneNet	IoU mean	0.655	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clawcranenet-leveraging-object-level-relation/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=clawcranenet-leveraging-object-level-relation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clawcranenet-leveraging-object-level-relation/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=clawcranenet-leveraging-object-level-relation)`

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

19 Mar 2021 · Chen Liang, Yu Wu, Yawei Luo, Yi Yang ·

Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos. It essentially requires semantic comprehension and fine-grained video understanding. Existing methods introduce language representation into segmentation models in a bottom-up manner, which merely conducts vision-language interaction within local receptive fields of ConvNets. We argue that such interaction is not fulfilled since the model can barely construct region-level relationships given partial observations, which is contrary to the description logic of natural language/referring expressions. In fact, people usually describe a target object using relations with other objects, which may not be easily understood without seeing the whole video. To address the issue, we introduce a novel top-down approach by imitating how we human segment an object with the language guidance. We first figure out all candidate objects in videos and then choose the refereed one by parsing relations among those high-level objects. Three kinds of object-level relations are investigated for precise relationship understanding, i.e., positional relation, text-guided semantic relation, and temporal relation. Extensive experiments on A2D Sentences and J-HMDB Sentences show our method outperforms state-of-the-art methods by a large margin. Qualitative results also show our results are more explainable.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Object

Referring Expression Segmentation

Relation

Video Segmentation

Video Semantic Segmentation

Video Understanding

Datasets

ImageNet

JHMDB

A2D

A2D Sentences

Results from the Paper

Edit

Ranked #4 on Referring Expression Segmentation on J-HMDB (Precision@0.9 metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	ClawCraneNet	Precision@0.5	0.704	# 9	Compare
			Precision@0.9	0.171	# 5	Compare
			IoU overall	0.644	# 17	Compare
			IoU mean	0.655	# 5	Compare
			Precision@0.6	0.677	# 8	Compare
			Precision@0.7	0.617	# 6	Compare
			Precision@0.8	0.489	# 5	Compare
Referring Expression Segmentation	J-HMDB	ClawCraneNet	Precision@0.5	0.880	# 6	Compare
			Precision@0.6	0.796	# 6	Compare
			Precision@0.7	0.566	# 7	Compare
			Precision@0.8	0.147	# 7	Compare
			Precision@0.9	0.002	# 4	Compare
			IoU overall	0.644	# 8	Compare
			IoU mean	0.655	# 7	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove