TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Open-Vocabulary Instance Segmentation	S3DIS	Lowis3D	AP50 Base B8/N4	58.7	# 3
3D Open-Vocabulary Instance Segmentation	S3DIS	Lowis3D	AP50 Novel B8/N4	13.8	# 3
3D Open-Vocabulary Instance Segmentation	S3DIS	Lowis3D	AP50 Base B6/N6	51.8	# 1
3D Open-Vocabulary Instance Segmentation	S3DIS	Lowis3D	AP50 Novel B6/N6	15.8	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lowis3d-language-driven-open-world-instance/3d-open-vocabulary-instance-segmentation-on-2)](https://paperswithcode.com/sota/3d-open-vocabulary-instance-segmentation-on-2?p=lowis3d-language-driven-open-world-instance)`

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

1 Aug 2023 · Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi ·

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories. A key factor for the recent progress in 2D open-world perception is the availability of large-scale image-text pairs from the Internet, which cover a wide range of vocabulary concepts. However, this success is hard to replicate in 3D scenarios due to the scarcity of 3D-text pairs. To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes. This allows us to establish explicit associations between 3D shapes and semantic-rich captions. Moreover, to enhance the fine-grained visual-semantic representation learning from captions for object-level categorization, we design hierarchical point-caption association methods to learn semantic-aware embeddings that exploit the 3D geometry between 3D points and multi-view images. In addition, to tackle the localization challenge for novel classes in the open-world setting, we develop debiased instance localization, which involves training object grouping modules on unlabeled data using instance-level pseudo supervision. This significantly improves the generalization capabilities of instance grouping and thus the ability to accurately locate novel objects. We conduct extensive experiments on 3D semantic, instance, and panoptic segmentation tasks, covering indoor and outdoor scenes across three datasets. Our method outperforms baseline methods by a significant margin in semantic segmentation (e.g. 34.5%$\sim$65.3%), instance segmentation (e.g. 21.8%$\sim$54.0%) and panoptic segmentation (e.g. 14.7%$\sim$43.3%). Code will be available.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

3D Open-Vocabulary Instance Segmentation

Instance Segmentation

Panoptic Segmentation

Representation Learning

Scene Understanding

Segmentation

Semantic Segmentation

Datasets

ScanNet

S3DIS

Results from the Paper

Edit

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on S3DIS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Open-Vocabulary Instance Segmentation	S3DIS	Lowis3D	AP50 Base B8/N4	58.7	# 3	Compare
			AP50 Novel B8/N4	13.8	# 3	Compare
			AP50 Base B6/N6	51.8	# 1	Compare
			AP50 Novel B6/N6	15.8	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove