Generalized Referring Expression Segmentation

6 papers with code • 1 benchmarks • 1 datasets

Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).

Datasets


Most implemented papers

GRES: Generalized Referring Expression Segmentation

henghuiding/ReLA CVPR 2023

Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Vision-Language Transformer and Query Generation for Referring Segmentation

henghuiding/Vision-Language-Transformer ICCV 2021

We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.

CRIS: CLIP-Driven Referring Image Segmentation

DerrickWang005/CRIS.pytorch CVPR 2022

In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

yz93/lavt-ris CVPR 2022

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.