Referring Expression Segmentation

68 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression Segmentation

Dataset	Best Model	Compare
A2D Sentences	SgMg (Video-Swin-B)	See all
RefCoCo val	HIPIE	See all
Refer-YouTube-VOS (2021 public validation)	GLEE-Pro	See all
RefCOCO testA	UNINEXT-H	See all
RefCOCO+ val	HIPIE	See all
J-HMDB	SgMg (Video-Swin-B)	See all
RefCOCO testB	UNINEXT-H	See all
RefCOCO+ testA	UniLSeg-100	See all
RefCOCO+ test B	UniLSeg-100	See all
DAVIS 2017 (val)	UNINEXT-H	See all
RefCoCo val	UNINEXT-H	See all
RefCOCOg-val	UniLSeg-100	See all
RefCOCOg-test	UniLSeg-100	See all
PhraseCut	GLIPv2	See all
ReferIt	PolyFormer-L	See all
Refer-YouTube-VOS	RefVOS-Human REs	See all
RefCOCO	GLEE-Pro	See all
RefCOCO testA	EVP	See all
RefCOCO testB	EVP	See all
CLEVR-Ref+	IEP-Ref (700K prog.)	See all
A2Dre test	RefVos	See all
Referring Expressions for DAVIS 2016 & 2017	MUTR	See all
G-Ref val	MaIL	See all
G-Ref test A	MaIL	See all
G-Ref test B	MaIL	See all

Show all 25 benchmarks

Collapse benchmarks

Datasets

Subtasks

Generalized Referring Expression Segmentation

Latest papers

Most implemented Social Latest No code

GLaMM: Pixel Grounding Large Multimodal Model

mbzuai-oryx/groundingLMM • • 6 Nov 2023

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

576

06 Nov 2023

Paper
Code

Towards Omni-supervised Referring Expression Segmentation

nineblu/omni-res • • 1 Nov 2023

To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.

01 Nov 2023

Paper
Code

Tracking Anything with Decoupled Video Segmentation

hkchengrex/Tracking-Anything-with-DEVA • • ICCV 2023

To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.

1,062

07 Sep 2023

Paper
Code

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

sosppxo/3d-stmn • • 31 Aug 2023

In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.

31 Aug 2023

Paper
Code

Referring Image Segmentation Using Text Supervision

fawnliu/tris • • ICCV 2023

Hence, we propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process to differentiate between positive and negative text expressions.

28 Aug 2023

Paper
Code

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

qwenlm/qwen-vl • • 24 Aug 2023

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.

3,707

24 Aug 2023

Paper
Code

EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

lab206/epcformer • 8 Aug 2023

Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions.

08 Aug 2023

Paper
Code

Spectrum-guided Multi-granularity Referring Video Object Segmentation

bo-miao/sgmg • • ICCV 2023

To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks.

25 Jul 2023

Paper
Code

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

kkakkkka/etris • • ICCV 2023

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

21 Jul 2023

Paper
Code

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

wudongming97/onlinerefer • • ICCV 2023

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.

18 Jul 2023

Paper
Code

Referring Expression Segmentation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result