Referring Expression Segmentation

68 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression Segmentation

Dataset	Best Model	Compare
A2D Sentences	SgMg (Video-Swin-B)	See all
RefCoCo val	HIPIE	See all
Refer-YouTube-VOS (2021 public validation)	GLEE-Pro	See all
RefCOCO testA	UNINEXT-H	See all
RefCOCO+ val	HIPIE	See all
J-HMDB	SgMg (Video-Swin-B)	See all
RefCOCO testB	UNINEXT-H	See all
RefCOCO+ testA	UniLSeg-100	See all
RefCOCO+ test B	UniLSeg-100	See all
DAVIS 2017 (val)	UNINEXT-H	See all
RefCoCo val	UNINEXT-H	See all
RefCOCOg-val	UniLSeg-100	See all
RefCOCOg-test	UniLSeg-100	See all
PhraseCut	GLIPv2	See all
ReferIt	PolyFormer-L	See all
Refer-YouTube-VOS	RefVOS-Human REs	See all
RefCOCO	GLEE-Pro	See all
RefCOCO testA	EVP	See all
RefCOCO testB	EVP	See all
CLEVR-Ref+	IEP-Ref (700K prog.)	See all
A2Dre test	RefVos	See all
Referring Expressions for DAVIS 2016 & 2017	MUTR	See all
G-Ref val	MaIL	See all
G-Ref test A	MaIL	See all
G-Ref test B	MaIL	See all

Show all 25 benchmarks

Collapse benchmarks

Datasets

Subtasks

Generalized Referring Expression Segmentation

Most implemented papers

Most implemented Social Latest No code

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet • • CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Paper
Code

Actor and Action Video Segmentation from a Sentence

JerryX1110/awesome-rvos • CVPR 2018

This paper strives for pixel-level segmentation of actors and their actions in video content.

Paper
Code

Referring Image Segmentation via Recurrent Refinement Networks

liruiyu/referseg_rrn • • CVPR 2018

We address the problem of image segmentation from natural language descriptions.

Paper
Code

Cross-Modal Self-Attention Network for Referring Image Segmentation

lwye/CMSA-Net • • CVPR 2019

This module controls the information flow of features at different levels.

Paper
Code

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

haowang1992/ACGA • • ICCV 2019

To address these issues, we propose an asymmetric cross-guided attention network for actor and action video segmentation from natural language query.

Paper
Code

Referring Expression Object Segmentation with Caption-Aware Consistency

wenz116/lang2seg • • 10 Oct 2019

To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.

Paper
Code

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

luogen1996/MCN • • CVPR 2020

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Paper
Code

Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters

ilkerkesen/bvpr • • 28 Mar 2020

Our experiments reveal that using language to control the filters for bottom-up visual processing in addition to top-down attention leads to better results on both tasks and achieves competitive performance.

Paper
Code

PhraseCut: Language-based Image Segmentation in the Wild

ChenyunWu/PhraseCutDataset • CVPR 2020

We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77, 262 images and 345, 486 phrase-region pairs.

Paper
Code

Referring Image Segmentation via Cross-Modal Progressive Comprehension

spyflying/CMPC-Refseg • • CVPR 2020

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Paper
Code

Referring Expression Segmentation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result