Vision and Language Navigation
88 papers with code • 5 benchmarks • 13 datasets
Libraries
Use these libraries to find Vision and Language Navigation models and implementationsLatest papers with no code
VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions.
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks.
Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model
The challenging task of Vision-and-Language Navigation (VLN) requires embodied agents to follow natural language instructions to reach a goal location or object (e. g. `walk down the hallway and turn left at the piano').
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model.
Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy.
Test-time Adaptive Vision-and-Language Navigation
Then, these components are adaptively accumulated to pinpoint a concordant direction for fast model adaptation.
Vision and Language Navigation in the Real World via Online Visual Language Mapping
Directly transferring SOTA navigation policies trained in simulation to the real world is challenging due to the visual domain gap and the absence of prior knowledge about unseen environments.
LangNav: Language as a Perceptual Representation for Navigation
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings.
Evaluating Explanation Methods for Vision-and-Language Navigation
The ability to navigate robots with natural language instructions in an unknown environment is a crucial step for achieving embodied artificial intelligence (AI).
Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation
In the indoor-aware stage, we apply an efficient tuning paradigm to learn deep visual prompts from an indoor dataset, in order to augment pretrained models with inductive biases towards indoor environments.