Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Latest papers with no code

On the Importance of Karaka Framework in Multi-modal Grounding

no code yet • 9 Apr 2022

Computational Paninian Grammar model helps in decoding a natural language expression as a series of modifier-modified relations and therefore facilitates in identifying dependency relations closer to language (context) semantics compared to the usual Stanford dependency relations.

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

no code yet • ACL ARR November 2021

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Vision-Language Navigation: A Survey and Taxonomy

no code yet • 26 Aug 2021

This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions in these tasks.

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code yet • 22 Nov 2020

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

no code yet • Findings of the Association for Computational Linguistics 2020

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments.

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code yet • CVPR 2020

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Generalized Natural Language Grounded Navigation via Environment-agnostic Multitask Learning

no code yet • 25 Sep 2019

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

no code yet • CVPR 2019

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.