TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	G-Ref test A	MaIL	Overall IoU	62.87	# 1
Referring Expression Segmentation	G-Ref test B	MaIL	Overall IoU	61.81	# 1
Referring Expression Segmentation	G-Ref val	MaIL	Overall IoU	62.45	# 1
Referring Expression Segmentation	RefCOCO testA	MaIL	Overall IoU	71.71	# 14
Referring Expression Segmentation	RefCOCO+ testA	MaIL	Overall IoU	65.92	# 12
Referring Expression Segmentation	RefCOCO testB	MaIL	Overall IoU	66.92	# 8
Referring Expression Segmentation	RefCOCO+ test B	MaIL	Overall IoU	56.06	# 10
Referring Expression Segmentation	RefCoCo val	MaIL	Overall IoU	70.13	# 15
Referring Expression Segmentation	RefCOCO+ val	MaIL	Overall IoU	62.23	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-g-ref-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-g-ref-1?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-g-ref-2)](https://paperswithcode.com/sota/referring-expression-segmentation-on-g-ref-2?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-g-ref)](https://paperswithcode.com/sota/referring-expression-segmentation-on-g-ref?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco-2)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-2?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco-5)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco-4)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-4?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco-3)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-1?p=mail-a-unified-mask-image-language-trimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mail-a-unified-mask-image-language-trimodal/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=mail-a-unified-mask-image-language-trimodal)`

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation

21 Nov 2021 · Zizhang Li, Mengmeng Wang, Jianbiao Mei, Yong liu ·

Referring image segmentation is a typical multi-modal task, which aims at generating a binary mask for referent described in given language expressions. Prior arts adopt a bimodal solution, taking images and languages as two modalities within an encoder-fusion-decoder pipeline. However, this pipeline is sub-optimal for the target task for two reasons. First, they only fuse high-level features produced by uni-modal encoders separately, which hinders sufficient cross-modal learning. Second, the uni-modal encoders are pre-trained independently, which brings inconsistency between pre-trained uni-modal tasks and the target multi-modal task. Besides, this pipeline often ignores or makes little use of intuitively beneficial instance-level features. To relieve these problems, we propose MaIL, which is a more concise encoder-decoder pipeline with a Mask-Image-Language trimodal encoder. Specifically, MaIL unifies uni-modal feature extractors and their fusion model into a deep modality interaction encoder, facilitating sufficient feature interaction across different modalities. Meanwhile, MaIL directly avoids the second limitation since no uni-modal encoders are needed anymore. Moreover, for the first time, we propose to introduce instance masks as an additional modality, which explicitly intensifies instance-level features and promotes finer segmentation results. The proposed MaIL set a new state-of-the-art on all frequently-used referring image segmentation datasets, including RefCOCO, RefCOCO+, and G-Ref, with significant gains, 3%-10% against previous best methods. Code will be released soon.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Segmentation

Referring Expression Segmentation

Segmentation

Semantic Segmentation

Datasets

RefCOCO

Results from the Paper

Edit

Ranked #1 on Referring Expression Segmentation on G-Ref test B

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	G-Ref test A	MaIL	Overall IoU	62.87	# 1	Compare
Referring Expression Segmentation	G-Ref test B	MaIL	Overall IoU	61.81	# 1	Compare
Referring Expression Segmentation	G-Ref val	MaIL	Overall IoU	62.45	# 1	Compare
Referring Expression Segmentation	RefCOCO testA	MaIL	Overall IoU	71.71	# 14	Compare
Referring Expression Segmentation	RefCOCO+ testA	MaIL	Overall IoU	65.92	# 12	Compare
Referring Expression Segmentation	RefCOCO testB	MaIL	Overall IoU	66.92	# 8	Compare
Referring Expression Segmentation	RefCOCO+ test B	MaIL	Overall IoU	56.06	# 10	Compare
Referring Expression Segmentation	RefCoCo val	MaIL	Overall IoU	70.13	# 15	Compare
Referring Expression Segmentation	RefCOCO+ val	MaIL	Overall IoU	62.23	# 13	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove