Vision and Language Navigation
85 papers with code • 5 benchmarks • 13 datasets
Libraries
Use these libraries to find Vision and Language Navigation models and implementationsLatest papers
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
Most Vision-and-Language Navigation (VLN) algorithms tend to make decision errors, primarily due to a lack of visual common sense and insufficient reasoning capabilities.
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion.
NavHint: Vision and Language Navigation Agent with a Hint Generator
The hint generator assists the navigation agent in developing a global understanding of the visual environment.
WebVLN: Vision-and-Language Navigation on Websites
Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations.
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
To address this problem, we propose a novel Grounded Entity-Landmark Adaptive (GELA) pre-training paradigm for VLN tasks.
March in Chat: Interactive Prompting for Remote Embodied Referring Expression
Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.
AerialVLN: Vision-and-Language Navigation for UAVs
Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.
Scaling Data Generation in Vision-and-Language Navigation
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.