Vision and Language Navigation
88 papers with code • 5 benchmarks • 13 datasets
Libraries
Use these libraries to find Vision and Language Navigation models and implementationsLatest papers with no code
AIGeN: An Adversarial Approach for Instruction Generation in VLN
VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal.
IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation
To address this challenge, we propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping, where it is autonomously constructed by fusing the RGBD video data collected from the robot agent with special-designed natural language map indexing in the bird's-in-eye view.
Scaling Vision-and-Language Navigation With Offline RL
The study of vision-and-language navigation (VLN) has typically relied on expert trajectories, which may not always be available in real-world situations due to the significant effort required to collect them.
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Recent advances in Iterative Vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes.
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
To avoid this problem, we construct object connections based on observations from all viewpoints in the navigational environment, which ensures complete spatial coverage and eliminates the gap, called Spatial Object Relations (SOR).
Continual Vision-and-Language Navigation
For the training and evaluation of CVLN agents, we re-arrange existing VLN datasets to propose two datasets: CVLN-I, focused on navigation via initial-instruction interpretation, and CVLN-D, aimed at navigation through dialogue with other agents.
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset.
Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.
Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios.
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.