Visual Navigation
105 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsLatest papers
Offline Reinforcement Learning for Visual Navigation
Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass.
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Last-Mile Embodied Visual Navigation
Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative phases.
Towards Versatile Embodied Navigation
With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.
ViNL: Visual Navigation and Locomotion Over Obstacles
ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands.
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions.
Visual Pre-training for Navigation: What Can We Learn from Noise?
One powerful paradigm in visual navigation is to predict actions from observations directly.
Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation
In reality, it is often more efficient to ask for help than to search the entire space to find an object with an unknown location.
What do navigation agents learn about their environment?
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.