Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision-Language Navigation

Trend	Dataset	Best Model	Paper	Code	Compare
	Room2Room	R2R+EnvDrop			See all

Datasets

Latest papers with no code

Most implemented Social Latest No code

On the Importance of Karaka Framework in Multi-modal Grounding

no code yet • 9 Apr 2022

Computational Paninian Grammar model helps in decoding a natural language expression as a series of modifier-modified relations and therefore facilitates in identifying dependency relations closer to language (context) semantics compared to the usual Stanford dependency relations.

Paper
Add Code

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

no code yet • ACL ARR November 2021

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Paper
Add Code

Vision-Language Navigation: A Survey and Taxonomy

no code yet • 26 Aug 2021

This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions in these tasks.

Paper
Add Code

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code yet • 22 Nov 2020

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Paper
Add Code

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

no code yet • Findings of the Association for Computational Linguistics 2020

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments.

Paper
Add Code

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code yet • CVPR 2020

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Paper
Add Code

Generalized Natural Language Grounded Navigation via Environment-agnostic Multitask Learning

no code yet • 25 Sep 2019

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.

Paper
Add Code

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

no code yet • CVPR 2019

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

Paper
Add Code

Vision-Language Navigation

Benchmarks Add a Result

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result