Visual Navigation
107 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsLatest papers
What do navigation agents learn about their environment?
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.
Zero-shot object goal visual navigation
Object goal visual navigation is a challenging task that aims to guide a robot to find the target object based on its visual observation, and the target is limited to the classes pre-defined in the training stage.
Contrastive Learning for Image Registration in Visual Teach and Repeat Navigation
Visual teach and repeat navigation (VT&R) is popular in robotics thanks to its simplicity and versatility.
A Visual Navigation Perspective for Category-Level Object Pose Estimation
In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task.
Benchmarking Visual Localization for Autonomous Navigation
The experimental part of the paper studies the effects of four such variables by evaluating state-of-the-art visual localization methods as part of the motion planning module of an autonomous navigation stack.
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
To balance the complexity of large action space reasoning and fine-grained language grounding, we dynamically combine a fine-scale encoding over local observations and a coarse-scale encoding on a global map via graph transformers.
Sound Adversarial Audio-Visual Navigation
In this work, we design an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent.
RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground Cues
In this work, we tackle this gap for the specific case of camera-based navigation, formulating it as following a visual cue in the foreground with arbitrary backgrounds.