Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision-Language Navigation

Trend	Dataset	Best Model	Paper	Code	Compare
	Room2Room	R2R+EnvDrop			See all

Datasets

Latest papers

Most implemented Social Latest No code

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

jialuli-luka/clear • • Findings (NAACL) 2022

Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.

05 Jul 2022

Paper
Code

Reinforced Structured State-Evolution for Vision-Language Navigation

chenjinyubuaa/sevol • • CVPR 2022

However, the crucial navigation clues (i. e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured.

20 Apr 2022

Paper
Code

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

hanqingwangai/ccc-vln • • CVPR 2022

Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.

30 Mar 2022

Paper
Code

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

liangcici/probes-vln • ACL 2022

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

08 Mar 2022

Paper
Code

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

aburns4/MoTIF • • 4 Feb 2022

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

04 Feb 2022

Paper
Code

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

liangcici/CITL-VLN • • 8 Dec 2021

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

08 Dec 2021

Paper
Code

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

expectorlin/DR-Attacker • • 23 Jul 2021

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

23 Jul 2021

Paper
Code

Vision-Language Navigation with Random Environmental Mixup

lcfractal/vlnrem • • ICCV 2021

Then, we cross-connect the key views of different scenes to construct augmented scenes.

15 Jun 2021

Paper
Code

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

jialuli-luka/SyntaxVLN • • NAACL 2021

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

19 Apr 2021

Paper
Code

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

yuankaiqi/orist • • ICCV 2021

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

09 Apr 2021

Paper
Code

Vision-Language Navigation

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result