Visual Navigation
105 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsLatest papers
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation.
TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions
Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model.
MemoNav: Working Memory Model for Visual Navigation
Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.
Towards Learning a Generalist Model for Embodied Navigation
We conduct extensive experiments to evaluate the performance and generalizability of our model.
What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation
In the case of localisation, important dataset differences impacting performance are modes of appearance change, including weather, lighting, and season.
Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network
This method combines target detection information with the relative semantic similarity between the target and the navigation target, and constructs a brand new state representation based on similarity ranking, this state representation does not include target feature or environment feature, effectively decoupling the agent's navigation ability from target features.
CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes
Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.
CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes
Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.