Referring Expression Segmentation

68 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

GLaMM: Pixel Grounding Large Multimodal Model

mbzuai-oryx/groundingLMM 6 Nov 2023

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

576
06 Nov 2023

Towards Omni-supervised Referring Expression Segmentation

nineblu/omni-res 1 Nov 2023

To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.

2
01 Nov 2023

Tracking Anything with Decoupled Video Segmentation

hkchengrex/Tracking-Anything-with-DEVA ICCV 2023

To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.

1,062
07 Sep 2023

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

sosppxo/3d-stmn 31 Aug 2023

In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.

31
31 Aug 2023

Referring Image Segmentation Using Text Supervision

fawnliu/tris ICCV 2023

Hence, we propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process to differentiate between positive and negative text expressions.

55
28 Aug 2023

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

qwenlm/qwen-vl 24 Aug 2023

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.

3,707
24 Aug 2023

EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

lab206/epcformer 8 Aug 2023

Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions.

9
08 Aug 2023

Spectrum-guided Multi-granularity Referring Video Object Segmentation

bo-miao/sgmg ICCV 2023

To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks.

72
25 Jul 2023

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

kkakkkka/etris ICCV 2023

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

84
21 Jul 2023

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

wudongming97/onlinerefer ICCV 2023

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.

44
18 Jul 2023