TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Navigation	Cooperative Vision-and-Dialogue Navigation	NaviLLM	dist_to_end_reduction	7.90	# 1
Visual Navigation	Cooperative Vision-and-Dialogue Navigation	NaviLLM	spl	0.09	# 14
Visual Navigation	R2R	NaviLLM	spl	0.60	# 2
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	Exact Match	26.27	# 2
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	BLEU-1	39.73	# 1
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	BLEU-4	13.90	# 2
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	ROUGE	40.23	# 2
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	METEOR	16.56	# 1
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	CIDEr	80.77	# 2
Visual Navigation	SOON Test	NaviLLM	Nav-SPL	26.26	# 2
Visual Navigation	SOON Test	NaviLLM	SR	35.04	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-learning-a-generalist-model-for/visual-navigation-on-cooperative-vision-and-1)](https://paperswithcode.com/sota/visual-navigation-on-cooperative-vision-and-1?p=towards-learning-a-generalist-model-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-learning-a-generalist-model-for/visual-navigation-on-room-to-room-1)](https://paperswithcode.com/sota/visual-navigation-on-room-to-room-1?p=towards-learning-a-generalist-model-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-learning-a-generalist-model-for/3d-question-answering-3d-qa-on-scanqa-test-w)](https://paperswithcode.com/sota/3d-question-answering-3d-qa-on-scanqa-test-w?p=towards-learning-a-generalist-model-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-learning-a-generalist-model-for/visual-navigation-on-soon-test)](https://paperswithcode.com/sota/visual-navigation-on-soon-test?p=towards-learning-a-generalist-model-for)`

Towards Learning a Generalist Model for Embodied Navigation

4 Dec 2023 · Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, LiWei Wang ·

Building a generalist agent that can interact with the world is the intriguing target of AI systems, thus spurring the research for embodied navigation, where an agent is required to navigate according to instructions or respond to queries. Despite the major progress attained, previous works primarily focus on task-specific agents and lack generalizability to unseen scenarios. Recently, LLMs have presented remarkable capabilities across various fields, and provided a promising opportunity for embodied navigation. Drawing on this, we propose the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction. The schema-based instruction flexibly casts various tasks into generation problems, thereby unifying a wide range of tasks. This approach allows us to integrate diverse data sources from various datasets into the training, equipping NaviLLM with a wide range of capabilities required by embodied navigation. We conduct extensive experiments to evaluate the performance and generalizability of our model. The experimental results demonstrate that our unified model achieves state-of-the-art performance on CVDN, SOON, and ScanQA. Specifically, it surpasses the previous stats-of-the-art method by a significant margin of 29% in goal progress on CVDN. Moreover, our model also demonstrates strong generalizability and presents impressive results on unseen tasks, e.g., embodied question answering and 3D captioning.

PDF Abstract

Code

Add Remove Mark official

zd11024/NaviLLM official

lavi-lab/navillm official

Tasks

Add Remove

3D Question Answering (3D-QA)

Embodied Question Answering

Navigate

Question Answering

Visual Navigation

Datasets

R2R

EQA

Results from the Paper

Edit

Ranked #1 on Visual Navigation on Cooperative Vision-and-Dialogue Navigation

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Navigation	Cooperative Vision-and-Dialogue Navigation	NaviLLM	dist_to_end_reduction	7.90	# 1	Compare
Visual Navigation	Cooperative Vision-and-Dialogue Navigation	NaviLLM	spl	0.09	# 14	Compare
Visual Navigation	R2R	NaviLLM	spl	0.60	# 2	Compare
3D Question Answering (3D-QA)	ScanQA Test w/ objects	NaviLLM	Exact Match	26.27	# 2	Compare
			BLEU-1	39.73	# 1	Compare
			BLEU-4	13.90	# 2	Compare
			ROUGE	40.23	# 2	Compare
			METEOR	16.56	# 1	Compare
			CIDEr	80.77	# 2	Compare
Visual Navigation	SOON Test	NaviLLM	Nav-SPL	26.26	# 2	Compare
Visual Navigation	SOON Test	NaviLLM	SR	35.04	# 3	Compare

Methods

Add Remove

Focus

Edit Social Preview

Towards Learning a Generalist Model for Embodied Navigation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove