Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

Latest papers with no code

AIGeN: An Adversarial Approach for Instruction Generation in VLN

no code yet • 15 Apr 2024

VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal.

IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation

no code yet • 28 Mar 2024

To address this challenge, we propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping, where it is autonomously constructed by fusing the RGBD video data collected from the robot agent with special-designed natural language map indexing in the bird's-in-eye view.

Scaling Vision-and-Language Navigation With Offline RL

no code yet • 27 Mar 2024

The study of vision-and-language navigation (VLN) has typically relied on expert trajectories, which may not always be available in real-world situations due to the significant effort required to collect them.

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

no code yet • 26 Mar 2024

Recent advances in Iterative Vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes.

Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation

no code yet • 23 Mar 2024

To avoid this problem, we construct object connections based on observations from all viewpoints in the navigational environment, which ensures complete spatial coverage and eliminates the gap, called Spatial Object Relations (SOR).

Continual Vision-and-Language Navigation

no code yet • 22 Mar 2024

For the training and evaluation of CVLN agents, we re-arrange existing VLN datasets to propose two datasets: CVLN-I, focused on navigation via initial-instruction interpretation, and CVLN-D, aimed at navigation through dialogue with other agents.

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

no code yet • 15 Mar 2024

Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset.

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

no code yet • 9 Mar 2024

For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.

Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation

no code yet • 6 Mar 2024

Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

no code yet • 24 Feb 2024

Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.