Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

mrzihan/hnr-vln 2 Apr 2024

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments.

6
02 Apr 2024

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

5
21 Mar 2024

Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty

joeyy5588/planning-as-inpainting 2 Dec 2023

In this paper, we aim to tackle this problem with a unified framework consisting of an end-to-end trainable method and a planning algorithm.

13
02 Dec 2023

An Embodied Generalist Agent in 3D World

embodied-generalist/embodied-generalist 18 Nov 2023

Leveraging massive knowledge and learning schemes from large language models (LLMs), recent machine learning models show notable successes in building generalist agents that exhibit the capability of general-purpose task solving in diverse domains, including natural language processing, computer vision, and robotics.

193
18 Nov 2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

defaultrui/bev-scene-graph ICCV 2023

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

5
09 Aug 2023

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

marsaki/etpnav 6 Apr 2023

To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.

161
06 Apr 2023

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

chengaopro/azhp CVPR 2023

In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i. e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning.

9
01 Jan 2023

Towards Versatile Embodied Navigation

hanqingwangai/vxn 30 Oct 2022

With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.

18
30 Oct 2022

DANLI: Deliberative Agent for Following Natural Language Instructions

sled-group/danli 22 Oct 2022

These reactive agents are insufficient for long-horizon complex tasks.

17
22 Oct 2022

Target-Driven Structured Transformer Planner for Vision-Language Navigation

yushengzhao/td-stp 19 Jul 2022

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

12
19 Jul 2022