Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision and Language Navigation

Dataset	Best Model	Compare
VLN Challenge	human	See all
Touchdown Dataset	ORAR + junction type + heading delta	See all
RxR	MARVAL	See all
map2seq	ORAR + junction type + heading delta	See all
robo-vln	Hierarchical Cross-Modal Agent	See all

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

google-research/valan

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

weituo12321/PREVALENT • CVPR 2020

By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.

Paper
Code

Sub-Instruction Aware Vision-and-Language Navigation

YicongHong/Fine-Grained-R2R • EMNLP 2020

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Paper
Code

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

arjunmajum/vln-bert • • ECCV 2020

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Paper
Code

Diagnosing the Environment Bias in Vision-and-Language Navigation

zhangybzbo/EnvBiasVLN • 6 May 2020

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.

Paper
Code

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

Sha-Lab/babywalk • • ACL 2020

To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially.

Paper
Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

VegB/VLN-Transformer • • EACL 2021

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Paper
Code

Language and Visual Entity Relationship Graph for Agent Navigation

YicongHong/Entity-Graph-VLN • • NeurIPS 2020

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Paper
Code

Sim-to-Real Transfer for Vision-and-Language Navigation

batra-mlp-lab/vln-sim2real • • 7 Nov 2020

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Paper
Code