TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Object Captioning	Objaverse	3D-LLM	GPT-4	33.42	# 6
3D Object Captioning	Objaverse	3D-LLM	Sentence-BERT	44.48	# 6
3D Object Captioning	Objaverse	3D-LLM	SimCSE	43.68	# 6
3D Object Captioning	Objaverse	3D-LLM	Correctness	1.77	# 4
3D Object Captioning	Objaverse	3D-LLM	Hallucination	1.16	# 4
3D Object Captioning	Objaverse	3D-LLM	Precision	60.39	# 4
Generative 3D Object Classification	Objaverse	3D-LLM	Objaverse (I)	49.00	# 4
Generative 3D Object Classification	Objaverse	3D-LLM	Objaverse (Average)	45.25	# 6
Generative 3D Object Classification	Objaverse	3D-LLM	Objaverse (C)	41.50	# 4
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	Exact Match	23.2	# 4
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	BLEU-1	32.6	# 5
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	BLEU-4	8.4	# 6
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	ROUGE	34.8	# 4
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	METEOR	13.5	# 6
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	CIDEr	65.6	# 6
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	Exact Match	19.1	# 7
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	BLEU-1	37.3	# 3
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	BLEU-4	10.7	# 5
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	ROUGE	34.5	# 5
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	METEOR	14.3	# 4
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	CIDEr	67.1	# 5
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	Exact Match	19.1	# 7
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	BLEU-1	38.3	# 2
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	BLEU-4	11.6	# 4
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	ROUGE	35.3	# 3
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	METEOR	14.9	# 3
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	CIDEr	69.6	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-llm-injecting-the-3d-world-into-large/3d-question-answering-3d-qa-on-scanqa-test-w)](https://paperswithcode.com/sota/3d-question-answering-3d-qa-on-scanqa-test-w?p=3d-llm-injecting-the-3d-world-into-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-llm-injecting-the-3d-world-into-large/3d-object-captioning-on-objaverse-1)](https://paperswithcode.com/sota/3d-object-captioning-on-objaverse-1?p=3d-llm-injecting-the-3d-world-into-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-llm-injecting-the-3d-world-into-large/generative-3d-object-classification-on-1)](https://paperswithcode.com/sota/generative-3d-object-classification-on-1?p=3d-llm-injecting-the-3d-world-into-large)`

3D-LLM: Injecting the 3D World into Large Language Models

NeurIPS 2023 · Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan ·

Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

umass-foundation-model/3d-llm

771

openrobotlab/pointllm

388

Pointcept/GPT4Point

255

qizekun/ShapeLLM

Yui010206/CREMA

Tasks

Add Remove

3D Object Captioning

3D Question Answering (3D-QA)

Dense Captioning

Generative 3D Object Classification

Question Answering

Datasets

Objaverse

HM3D

Results from the Paper

Edit

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Object Captioning	Objaverse	3D-LLM	GPT-4	33.42	# 6	Compare
			Sentence-BERT	44.48	# 6	Compare
			SimCSE	43.68	# 6	Compare
			Correctness	1.77	# 4	Compare
			Hallucination	1.16	# 4	Compare
			Precision	60.39	# 4	Compare
Generative 3D Object Classification	Objaverse	3D-LLM	Objaverse (I)	49.00	# 4	Compare
			Objaverse (Average)	45.25	# 6	Compare
			Objaverse (C)	41.50	# 4	Compare
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (flamingo)	Exact Match	23.2	# 4	Compare
			BLEU-1	32.6	# 5	Compare
			BLEU-4	8.4	# 6	Compare
			ROUGE	34.8	# 4	Compare
			METEOR	13.5	# 6	Compare
			CIDEr	65.6	# 6	Compare
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-opt)	Exact Match	19.1	# 7	Compare
			BLEU-1	37.3	# 3	Compare
			BLEU-4	10.7	# 5	Compare
			ROUGE	34.5	# 5	Compare
			METEOR	14.3	# 4	Compare
			CIDEr	67.1	# 5	Compare
3D Question Answering (3D-QA)	ScanQA Test w/ objects	3D-LLM (BLIP2-flant5)	Exact Match	19.1	# 7	Compare
			BLEU-1	38.3	# 2	Compare
			BLEU-4	11.6	# 4	Compare
			ROUGE	35.3	# 3	Compare
			METEOR	14.9	# 3	Compare
			CIDEr	69.6	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

3D-LLM: Injecting the 3D World into Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove