Vision and Language Navigation
88 papers with code • 5 benchmarks • 13 datasets
Libraries
Use these libraries to find Vision and Language Navigation models and implementationsMost implemented papers
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.
Sub-Instruction Aware Vision-and-Language Navigation
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').
Diagnosing the Environment Bias in Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.
BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps
To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially.
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.
Language and Visual Entity Relationship Graph for Agent Navigation
From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.
Sim-to-Real Transfer for Vision-and-Language Navigation
We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.
A Recurrent Vision-and-Language BERT for Navigation
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
Diagnosing Vision-and-Language Navigation: What Really Matters
Results show that indoor navigation agents refer to both object and direction tokens when making decisions.