Referring Expression Segmentation

66 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

48
21 Mar 2024

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

minghanli/univs 28 Feb 2024

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

94
28 Feb 2024

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

foundationvision/uniref 25 Dec 2023

We evaluate our unified models on various benchmarks.

212
25 Dec 2023

General Object Foundation Model for Images and Videos at Scale

FoundationVision/GLEE 14 Dec 2023

We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.

665
14 Dec 2023

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

lavreniuk/evp 13 Dec 2023

Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone.

41
13 Dec 2023

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

rubics-xuan/mres 13 Dec 2023

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

34
13 Dec 2023

Universal Segmentation at Arbitrary Granularity with Language Instruction

workforai/UniLSeg 4 Dec 2023

This paper aims to achieve universal segmentation of arbitrary semantic level.

47
04 Dec 2023

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

rongyaofang/instructseq 30 Nov 2023

In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual data.

8
30 Nov 2023

NExT-Chat: An LMM for Chat, Detection and Segmentation

next-chatv/next-chat 8 Nov 2023

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

150
08 Nov 2023

GLaMM: Pixel Grounding Large Multimodal Model

mbzuai-oryx/groundingLMM 6 Nov 2023

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

523
06 Nov 2023