Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

yanyuanqiao/vln-petl ICCV 2023

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.

5
20 Aug 2023

AerialVLN: Vision-and-Language Navigation for UAVs

airvln/airvln ICCV 2023

Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.

33
13 Aug 2023

Scaling Data Generation in Vision-and-Language Navigation

wz0919/scalevln ICCV 2023

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

136
28 Jul 2023

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

haitianzeng/KEFA 25 Jul 2023

We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation.

1
25 Jul 2023

GridMM: Grid Memory Map for Vision-and-Language Navigation

mrzihan/gridmm ICCV 2023

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments.

53
24 Jul 2023

Learning Navigational Visual Representations with Semantic Map Supervision

yiconghong/ego2map-navit ICCV 2023

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.

24
23 Jul 2023

Learning Vision-and-Language Navigation from YouTube Videos

jeremylinky/youtube-vln ICCV 2023

In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.

32
22 Jul 2023

Behavioral Analysis of Vision-and-Language Navigation Agents

yoark/vln-behave CVPR 2023

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings.

9
20 Jul 2023

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

raphael-sch/velma 12 Jul 2023

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

9
12 Jul 2023

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

gengzezhou/navgpt 26 May 2023

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.

81
26 May 2023