Visual Navigation
105 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsMost implemented papers
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.
Learning Exploration Policies for Navigation
Numerous past works have tackled the problem of task-driven navigation.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Self-supervised learning aims to learn representations from the data itself without explicit manual supervision.
An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.
Vision-and-Dialog Navigation
To train agents that search an environment for a goal location, we define the Navigation from Dialog History task.
VUSFA:Variational Universal Successor Features Approximator to Improve Transfer DRL for Target Driven Visual Navigation
In this paper, we show how novel transfer reinforcement learning techniques can be applied to the complex task of target driven navigation using the photorealistic AI2THOR simulator.
SoundSpaces: Audio-Visual Navigation in 3D Environments
Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.
Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task.
Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning
This precludes the use of the learned policy on a real robot.