Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision-Language Navigation

Trend	Dataset	Best Model	Paper	Code	Compare
	Room2Room	R2R+EnvDrop			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent • • CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Paper
Code

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent • • ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Paper
Code

Cross-Lingual Vision-Language Navigation

zzxslp/Crosslingual-VLN • 24 Oct 2019

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.

Paper
Code

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

peteanderson80/Matterport3DSimulator • • ECCV 2018

In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.

Paper
Code

The Regretful Navigation Agent for Vision-and-Language Navigation

chihyaoma/regretful-agent • • CVPR 2019 (Oral) 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Paper
Code

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Kelym/FAST • • CVPR 2019

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

Paper
Code

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

airsplay/R2R-EnvDrop • • NAACL 2019

Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions.

Paper
Code

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

google-research/valan • • ECCV 2020

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.

Paper
Code

Active Visual Information Gathering for Vision-Language Navigation

HanqingWangAI/Active_VLN • • ECCV 2020

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.

Paper
Code

A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Homagn/MOVILAN • • 19 Jan 2021

In this paper we propose a new framework - MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks.

Paper
Code

Vision-Language Navigation

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result