|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
The accumulated belief of the world enables the agent to track visited regions of the environment.
We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.
#2 best model for Vision-Language Navigation on Room2Room
We find that SRCC for Habitat as used for the CVPR19 challenge is low (0. 18 for the success metric), which suggests that performance improvements for this simulator-based challenge would not transfer well to a physical robot.
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.
As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in  to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.
This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.
#3 best model for Visual Navigation on Room-to-Room
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.