TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Generalized Referring Expression Segmentation	gRefCOCO	GROUNDHOG	gIoU	66.70	# 1
Referring Expression Segmentation	PhraseCut	GROUNDHOG	Mean IoU	54.5	# 2
Referring Expression Segmentation	RefCOCOg-test	GROUNDHOG	Overall IoU	74.6	# 3
Referring Expression Segmentation	RefCOCOg-val	GROUNDHOG	Overall IoU	74.1	# 3
Referring Expression Segmentation	RefCOCO testA	GROUNDHOG	Overall IoU	79.9	# 4
Referring Expression Segmentation	RefCOCO+ testA	GROUNDHOG	Overall IoU	75.0	# 4
Referring Expression Segmentation	RefCOCO testB	GROUNDHOG	Overall IoU	75.7	# 2
Referring Expression Segmentation	RefCOCO+ test B	GROUNDHOG	Overall IoU	64.9	# 4
Referring Expression Segmentation	RefCoCo val	GROUNDHOG	Overall IoU	78.5	# 5
Referring Expression Segmentation	RefCOCO+ val	GROUNDHOG	Overall IoU	70.5	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/generalized-referring-expression-segmentation)](https://paperswithcode.com/sota/generalized-referring-expression-segmentation?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on)](https://paperswithcode.com/sota/referring-expression-segmentation-on?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco-2)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-2?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcocog-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog-1?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcocog)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-1?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco-4)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-4?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco-5)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=groundhog-grounding-large-language-models-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groundhog-grounding-large-language-models-to/referring-expression-segmentation-on-refcoco-3)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=groundhog-grounding-large-language-models-to)`

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

26 Feb 2024 · Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai ·

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Models to holistic segmentation. GROUNDHOG incorporates a masked feature extractor and converts extracted features into visual entity tokens for the MLLM backbone, which then connects groundable phrases to unified grounding masks by retrieving and merging the entity masks. To train GROUNDHOG, we carefully curated M3G2, a grounded visual instruction tuning dataset with Multi-Modal Multi-Grained Grounding, by harvesting a collection of segmentation-grounded datasets with rich annotations. Our experimental results show that GROUNDHOG achieves superior performance on various language grounding tasks without task-specific fine-tuning, and significantly reduces object hallucination. GROUNDHOG also demonstrates better grounding towards complex forms of visual input and provides easy-to-understand diagnosis in failure cases.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Causal Language Modeling

Generalized Referring Expression Segmentation

Hallucination

Language Modelling

Referring Expression Segmentation

Datasets

RefCOCO

Flickr30K Entities

Visual7W Google Refexp

PhraseCut

gRefCOCO

PointQA

Results from the Paper

Add Remove

Ranked #1 on Generalized Referring Expression Segmentation on gRefCOCO (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Generalized Referring Expression Segmentation	gRefCOCO	GROUNDHOG	gIoU	66.70	# 1	Compare
Referring Expression Segmentation	PhraseCut	GROUNDHOG	Mean IoU	54.5	# 2	Compare
Referring Expression Segmentation	RefCOCOg-test	GROUNDHOG	Overall IoU	74.6	# 3	Compare
Referring Expression Segmentation	RefCOCOg-val	GROUNDHOG	Overall IoU	74.1	# 3	Compare
Referring Expression Segmentation	RefCOCO testA	GROUNDHOG	Overall IoU	79.9	# 4	Compare
Referring Expression Segmentation	RefCOCO+ testA	GROUNDHOG	Overall IoU	75.0	# 4	Compare
Referring Expression Segmentation	RefCOCO testB	GROUNDHOG	Overall IoU	75.7	# 2	Compare
Referring Expression Segmentation	RefCOCO+ test B	GROUNDHOG	Overall IoU	64.9	# 4	Compare
Referring Expression Segmentation	RefCoCo val	GROUNDHOG	Overall IoU	78.5	# 5	Compare
Referring Expression Segmentation	RefCOCO+ val	GROUNDHOG	Overall IoU	70.5	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove