Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision and Language Navigation

Dataset	Best Model	Compare
VLN Challenge	human	See all
Touchdown Dataset	ORAR + junction type + heading delta	See all
RxR	MARVAL	See all
map2seq	ORAR + junction type + heading delta	See all
robo-vln	Hierarchical Cross-Modal Agent	See all

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

google-research/valan

2 papers

Datasets

Latest papers

Most implemented Social Latest No code

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

yanyuanqiao/vln-petl • ICCV 2023

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.

20 Aug 2023

Paper
Code

AerialVLN: Vision-and-Language Navigation for UAVs

airvln/airvln • • ICCV 2023

Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.

13 Aug 2023

Paper
Code

Scaling Data Generation in Vision-and-Language Navigation

wz0919/scalevln • • ICCV 2023

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

136

28 Jul 2023

Paper
Code

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

haitianzeng/KEFA • • 25 Jul 2023

We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation.

25 Jul 2023

Paper
Code

GridMM: Grid Memory Map for Vision-and-Language Navigation

mrzihan/gridmm • • ICCV 2023

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments.

24 Jul 2023

Paper
Code

Learning Navigational Visual Representations with Semantic Map Supervision

yiconghong/ego2map-navit • ICCV 2023

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.

23 Jul 2023

Paper
Code

Learning Vision-and-Language Navigation from YouTube Videos

jeremylinky/youtube-vln • • ICCV 2023

In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.

22 Jul 2023

Paper
Code

Behavioral Analysis of Vision-and-Language Navigation Agents

yoark/vln-behave • • CVPR 2023

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings.

20 Jul 2023

Paper
Code

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

raphael-sch/velma • • 12 Jul 2023

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

12 Jul 2023

Paper
Code

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

gengzezhou/navgpt • 26 May 2023

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.

26 May 2023

Paper
Code

Vision and Language Navigation

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result