Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

Most implemented papers

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

weituo12321/PREVALENT CVPR 2020

By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.

Sub-Instruction Aware Vision-and-Language Navigation

YicongHong/Fine-Grained-R2R EMNLP 2020

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

arjunmajum/vln-bert ECCV 2020

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Diagnosing the Environment Bias in Vision-and-Language Navigation

zhangybzbo/EnvBiasVLN 6 May 2020

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

Sha-Lab/babywalk ACL 2020

To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially.

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

VegB/VLN-Transformer EACL 2021

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Language and Visual Entity Relationship Graph for Agent Navigation

YicongHong/Entity-Graph-VLN NeurIPS 2020

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Sim-to-Real Transfer for Vision-and-Language Navigation

batra-mlp-lab/vln-sim2real 7 Nov 2020

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

A Recurrent Vision-and-Language BERT for Navigation

YicongHong/Recurrent-VLN-BERT 26 Nov 2020

In this paper we propose a recurrent BERT model that is time-aware for use in VLN.

Diagnosing Vision-and-Language Navigation: What Really Matters

VegB/Diagnose_VLN NAACL 2022

Results show that indoor navigation agents refer to both object and direction tokens when making decisions.